sync semantic embeddings

This commit is contained in:
Levi Neuwirth 2026-04-11 14:35:01 -04:00
parent 7e0b4c3a53
commit 256808d2b2
7 changed files with 1521 additions and 0 deletions

1
*Minibuf-1* Normal file
View File

@ -0,0 +1 @@
Search:

739
audit.md Normal file
View File

@ -0,0 +1,739 @@
# levineuwirth.org — Comprehensive Audit
**Auditor:** Independent code review (read-only, no changes made)
**Date:** 2026-04-09
**Scope:** ~15,400 lines across Haskell build system (`build/**/*.hs`), Pandoc filters (`build/Filters/*.hs`), static JavaScript (`static/js/*.js`), CSS (`static/css/*.css`), templates (`templates/**`), Python tooling (`tools/*.py`), shell scripts (`tools/*.sh`), `Makefile`, cabal/pyproject configuration, and repository hygiene.
**Methodology:** Direct reading of critical modules (`Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Filters/*.hs`, `Makefile`, shell scripts, `embed.py`); parallel exploration of JS, CSS, templates, and the larger Python tools.
Each finding is labeled by **severity** (`CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `NIT`) and cites file + line. The codebase is generally well-written — architecture is clean, modules are tightly scoped, YAML/frontmatter is parsed defensively, and escaping is applied in most HTML rendering sites. Most findings are local issues; the codebase does not exhibit systemic rot.
---
## Executive summary
**Confirmed correctness bugs (by impact):**
| # | File | Severity | Summary |
|---|------|----------|---------|
| 1 | `build/Filters/Images.hs:110` | **CRITICAL** | `lowerExt` is mathematically wrong — returns `"image."` for `"image.jpg"`. Every local raster fails `isLocalRaster`, so **no `<picture>` / WebP wrapping happens site-wide**. The entire WebP pipeline is dead code. |
| 2 | `build/Commonplace.hs:126-131` | **HIGH** | Operator-precedence bug in `renderChronoView`: `a ++ if c then x else y ++ z` parses as `a ++ (if c then x else (y ++ z))`, so `</div>` is never emitted when the commonplace book is empty → unclosed tag. |
| 3 | `tools/embed.py:68-73` | **HIGH** | Root `index.html` yields URL `"/./"` instead of `"/"`. Homepage is never matched by `SimilarLinks.hs`, so the "Related" block never renders on the home page. |
| 4 | `build/Authors.hs:50` | **HIGH** | `allContent` pattern does not include `content/essays/*/index.md` (directory-form essays). Author pages silently omit those essays. Compare against `Tags.hs:69`, which *does* include them. |
| 5 | `build/Filters/Score.hs:40` | **HIGH** | `TIO.readFile fullPath` is called with no existence check and no exception catch. A missing SVG aborts the entire build with a bare `openFile: does not exist` — no file name context, no graceful fallback. |
| 6 | `build/Filters/Viz.hs:96-99` | **HIGH** | Same pattern: `readProcessWithExitCode "python3" [fullPath]` runs even when `fullPath` doesn't exist; the only signal the author gets is a generic "non-zero exit". |
| 7 | `build/Filters/Sidenotes.hs:38` | **HIGH** | Sidenote labels wrap after the 26th note: `(n - 1) mod 26` turns note 27 into `a` again, creating duplicate `id="sn-a"` / `id="snref-a"` across the same document. Breaks in-page links and screen-readers. |
| 8 | `build/Filters/Images.hs:77` | **MEDIUM** | `passedKvs` filters only `loading` and `data-lightbox`, but not `id`, `class`, `alt`, or `title` — all of which are already emitted explicitly above. Any author-set `id=` or `class=` kv on an image is emitted **twice** in the `<img>`, producing invalid HTML (`<img id="x" id="x">`). |
| 9 | `build/Contexts.hs:263-264` | **MEDIUM** | `confidenceTrendField` uses `xs !! (length xs - 2)` (O(n) indexing) and `last xs`. They are guarded by a length check so they're safe, but this is a partial idiom in a module that otherwise uses total patterns. |
| 10 | `build/Filters/Links.hs:59` | **MEDIUM** | `not ("levineuwirth.org" 'T.isInfixOf' url)` — substring match. `https://evil-levineuwirth.org.attacker.com` is classified as *internal*, skipping `rel=noopener noreferrer target=_blank`. |
**Defense-in-depth findings:**
- `build/Filters/Transclusion.hs:41` interpolates the author-controlled `sec` section name into a `data-section="..."` attribute with no escaping. In a static site where all Markdown is author-authored this is not an exploitable XSS, but it is a raw-HTML injection primitive — a stray `"` in a section name will break markup, and any future lowering of the "author is trusted" assumption (PRs, multi-author site, user submissions) turns it into one.
- `build/Stats.hs:161-169` implements a correct URL allowlist (`isSafeUrl`) but accepts `"/"` as a prefix, which also matches `//evil.com` (protocol-relative URLs). Mostly cosmetic here since inputs come from Hakyll-computed routes, but the allowlist comment claims strict defense and this is a hole.
- Two different `authorSlugify` / `nameOf` implementations exist (`Authors.hs:30-39` and `Contexts.hs:147-154`). They'll drift the moment one is edited.
- Five copies of `escHtml``Utils.hs:18-26` (the "real" one), `Filters/Images.hs:135-142`, `Filters/Score.hs:88-92`, `Filters/Smallcaps.hs` (per the filter audit), `Filters/Viz.hs:178-182`, plus identical ones in JS (`annotations.js`, `popups.js`, `semantic-search.js`). Any fix must be made in 7+ places.
**Repository hygiene:**
- `.env` is gitignored and not tracked — good.
- `~5.4 MB` of `.docx` binaries (`BeyondComorbidityIndices*.docx`) sit in the repo root, untracked but present; they're build input for the new essay but should be moved under `paper/` or similar rather than the project root.
- `HOMEPAGE.md~` (zero-byte editor backup) is on disk; gitignore catches it, but it should be removed.
- `content/modern_idolatry.md` is untracked and not under `content/drafts/` — either it's a ready-to-publish draft that escaped the drafts workflow, or a forgotten scratch file.
- `build/Metadata.hs` contains only `module Metadata where` — a no-op placeholder dragged along since Phase 2. Delete or populate.
- `build/Filters/Math.hs` and `build/Filters/Dropcaps.hs` are `apply = id` placeholders; fine as TODO anchors, but `-Wno-unused-imports` in `levineuwirth.cabal` is masking warnings that would otherwise tell you so.
---
## 1. Haskell build system (`build/*.hs`)
### 1.1 `Site.hs`
**L-1.1.1 — LOW — Blog posts do not support directory-form pages.** `content/blog/*.md` (line 249) only matches flat posts; compare to essays and poetry, which accept both flat and `*/index.md`. If the author ever wants to co-locate blog assets, they'll have to edit both the rule and `Backlinks.hs:allContent`.
**L-1.1.2 — LOW — Backlinks pattern drift.** `allContent` in `Backlinks.hs:200-208`, `Authors.hs:50`, `Tags.hs:69`, and the implicit patterns in `Site.hs` all enumerate the same content types, slightly differently. Authors omits directory essays; Backlinks omits fiction/*/index.md; Tags includes both essay forms but not fiction. This divergence is the root of finding #4 (Authors missing directory essays) and will continue to produce silent bugs. Extract one canonical `Patterns.hs`.
**L-1.1.3 — LOW — `draftEssays``isDev` ties build correctness to an environment variable read at rule registration.** `isDev <- preprocess $ ... lookupEnv "SITE_ENV"` runs once at startup. Correct — but a developer toggling `SITE_ENV` mid-`cabal run site -- watch` will be confused. Worth a comment at the `preprocess` call, not just near `draftEssays`.
**L-1.1.4 — LOW — `library.html` loads all content four times.** `portalList` calls `loadAll essays`, `loadAll posts`, `loadAll fiction`, `loadAll poetry` **inside the inner list body**, which is re-evaluated for each of the eight `portalList` calls. That's 32 `loadAll` calls for eight portals. Hakyll caches identifiers so the impact is bounded, but it's still unnecessary work; hoist the loads into the outer `compile` block.
**NIT — `random-pages.json` (line 445).** The type annotation `:: Compiler [Item String]` on every binding is load-bearing because without it Hakyll can't infer the snapshot type. Fine, but a quick comment would save a future reader from thinking they're decorative.
### 1.2 `Contexts.hs`
**M-1.2.1 — MEDIUM — `authorLinksField` produces empty-slug URLs for empty author names.** `authorLinksField` (line 161) splits on `|`, trims, and calls `authorSlugify`. An entry like `"| https://url"` or `" "` produces name `""` → slug `""` → URL `/authors//`. Guard against empty names (fall back to `defaultAuthor` or skip the entry).
**M-1.2.2 — MEDIUM — `parseMovements` silently drops malformed entries.** `parseMovements` (line 380-397) uses `catMaybes $ map parseOne` — an entry missing `name` or `page` is dropped with zero diagnostic. Compositions with a typo in one movement silently lose it. Add at least a `putStrLn` warning via `unsafeCompiler` or fail loudly.
**L-1.2.3 — LOW — `abstractField` only strips single-`Para` abstracts.** Line 184-186: `Pandoc m [Para ils] -> Pandoc m [Plain ils]`. An abstract with inline `<br>` or line breaks becomes multiple `Para` blocks and the outer `<p>` is not stripped. Harmless but inconsistent.
**L-1.2.4 — LOW — `confidenceTrendField` threshold of ±5 is undocumented.** Line 267-269: `c - p > 5` → up, `p - c > 5` → down. The comment in the header describes behavior but not the threshold. Magic number.
**L-1.2.5 — LOW — `pageScriptsField` uses the script path as the item identifier.** Line 123: `Item (fromFilePath s) s`. If two separate frontmatter entries both load `shared.js`, they collide in Hakyll's item-store the first time `listField` evaluates them. Probably works by accident because the inner `script-src` field just returns `itemBody`; note the risk.
**NIT — `getInt` via `Rational → Double → floor`** (line 396). If a page number is `1000000000000000000` (unlikely), Double precision loss. Use `Scientific.floatingOrInteger` from `scientific` (already transitively available via Aeson).
### 1.3 `Stats.hs`
**M-1.3.1 — MEDIUM — `stripHtmlTags` is naive.** Line 108-111 strips `<...>` greedily, ignoring `>` inside attribute values, `<!-- ... -->` comments, and `<![CDATA[...]]>`. Used to compute word count and reading time for the `/build/` page so the impact is limited, but if a future author writes `alt="a > b"` (rare but legal) it'll slice the content.
**M-1.3.2 — MEDIUM — `walkDir` has no symlink-loop protection.** Line 406-416 recurses through `_site` via `doesDirectoryExist`, which follows symlinks. A developer who accidentally symlinks `_site/a → _site` will infinite-loop the build. Use `doesDirectoryExist` + `pathIsSymbolicLink` (in `directory >= 1.3.6`).
**L-1.3.3 — LOW — `isSafeUrl` allows protocol-relative URLs.** Line 161-164 accepts `"/"`-prefixed values. `"//evil.com"` matches this prefix. All current inputs are Hakyll-derived routes so the exposure is nil, but the comment ("Defense-in-depth URL allowlist") claims more rigor than the implementation provides. Fix: reject `u` that begins with `//`.
**L-1.3.4 — LOW — `readFile`/`Aeson.decodeStrict` round-trip.** Line 741 decodes backlinks via `TE.encodeUtf8 (T.pack rawBL)` where `rawBL :: String`. That is `String → Text → ByteString` — three copies. Read the item as `Item ByteString` via `getResourceLBS` (or keep backlinks.json as bytes throughout) to avoid two conversions.
**L-1.3.5 — LOW — Two separate tag sections.** `renderStatsTags` (line 380) and `renderTagsSection` (line 568) are the same function with different names. Consolidate.
**L-1.3.6 — LOW — Lazy `readFile` in `countLinesDir`.** Line 455: `readFile (dir </> e)` holds the handle open until `length (lines content)` is fully forced. Under `forM`, multiple handles may be concurrently open. For a 30-file build directory it's fine; use `Data.Text.IO.readFile` for explicit strictness.
**NIT — `lookupString "title" meta` fallback `"(untitled)"`** (line 71 and many siblings). Fine, but consider extracting a `titleOr` helper since it appears ~6 times.
### 1.4 `Backlinks.hs`
**L-1.4.1 — LOW — `normaliseUrl` does not URL-decode.** Line 188-194: stripping `?` and `#` is done on the raw URL without percent-decoding. A path like `/essays/caf%C3%A9` won't normalize to `/essays/café`. Current build likely does not emit percent-encoded routes, so this is latent.
**L-1.4.2 — LOW — `backlinksField` does not handle the "item with noResult route" case explicitly.** When `getRoute item` is `Nothing`, it fails with `"backlinks: item has no route"`. Fine, but that path is unreachable for items that have an associated rule. Note it, remove if always reachable.
**NIT — `renderBacklinks` concatenates strings; use blaze-html** to match `Stats.hs`. Not urgent; the output is static per build.
### 1.5 `Citations.hs`
**L-1.5.1 — LOW — Partial functions in `transformInline`.** Line 142: `head keys` / `head nums`. Guarded by `null nums` check above and by the structure of Pandoc `Cite` (never empty from the parser), so this is safe in practice. Swap to `case nums of (n:_) -> ...`.
**L-1.5.2 — LOW — `markerHtml` concatenates `T.unpack . show` via `tshow`** but also builds `data-cite-keys` as a space-separated list of HTML IDs with no escaping. If a citation key contains a quote character (unusual but legal), the attribute breaks.
**NIT — `stripRefPrefix` (line 209)** is `"ref-"`-specific; should be renamed `stripPandocRefPrefix` or documented with a pointer to the Pandoc source that emits it.
### 1.6 `Compilers.hs`
**L-1.6.1 — LOW — `pageCompiler` does not save a `toc` snapshot.** OK for pages that use `pageCtx`, but the commonplace, landing, and standalone pages that would benefit from a TOC get no opportunity. Not a bug — an architectural choice worth documenting.
**NIT — `stringify`** is redefined here (line 56-77) in addition to `Filters/Images.hs:119-132` and the one `Text.Pandoc.Shared` exports. Three implementations. Pick one.
### 1.7 `Stability.hs`
**M-1.7.1 — MEDIUM — `readIgnore` uses lazy `readFile`.** Line 44: handle stays open until the whole list is forced. Fine for a single-shot read but the pattern is fragile; `Data.Text.IO.readFile` is strict.
**L-1.7.2 — LOW — `unsafeCompiler` for git subprocess breaks Hakyll's dep tracking.** `stabilityField` calls `git log` via `unsafeCompiler`. Hakyll will not re-run the compiler when HEAD moves. Expected — `make build` always runs `git add content/` + commit first, which updates mtimes — but it's fragile to reason about. Worth a note at the `unsafeCompiler` call site rather than the header docs.
**L-1.7.3 — LOW — `gitDates` ignores `stderr`.** Line 54: `(ec, out, _) <- readProcessWithExitCode ...``_` drops the error. If the file isn't tracked yet, git prints a warning to stderr; user sees nothing. Log it.
**NIT — `stabilityFromDates` classification is undocumented magic.** `n <= 5 && age < 90` → "revising". These thresholds should be constants with intent comments.
### 1.8 `Catalog.hs`
**M-1.8.1 — MEDIUM — `renderEntry` does not escape frontmatter.** `ceTitle`, `ceYear`, `ceDuration`, `ceInstrumentation`, and `ceUrl` are pasted directly into HTML via `concat`. This is consistent with the site's "author-controlled trusted HTML in titles" convention (`Stats.hs:180-186` calls this out explicitly), but `Catalog.hs` has *no such comment*. If a collaborator's frontmatter contains a stray `<` or a malformed entry, the HTML breaks silently.
Suggest: adopt the `pageLink` convention from `Stats.hs` — escape `href` via `safeHref`, pass title through `preEscapedToHtml` with a documented comment.
**L-1.8.2 — LOW — `renderCategorySection` assumes non-empty group.** Line 194: `categoryLabel (ceCategory (head g))`. `groupBy` on a non-empty list produces non-empty sublists, so this is safe, but partial.
**NIT — `categoryRank` uses `lookup` instead of `elemIndex`.** Shorter:
```haskell
categoryRank c = fromMaybe (length categoryOrder) (elemIndex c categoryOrder)
```
### 1.9 `Commonplace.hs`
**H-1.9.1 — HIGH — Operator-precedence bug in `renderChronoView` (line 126-131).**
```haskell
renderChronoView entries =
"<div class=\"cp-chrono\" id=\"cp-chrono\" hidden>"
++ if null sorted
then "<p class=\"cp-empty\">No entries yet.</p>"
else concatMap renderEntry sorted
++ "</div>"
```
Parses as `"..." ++ (if null sorted then "..." else (concatMap renderEntry sorted ++ "</div>"))`. When `sorted` is empty, the closing `</div>` is silently dropped. Fix: parenthesize the `if`, or split into two lines with explicit binding.
**L-1.9.2 — LOW — `renderText` replaces `\n` with `<br>\n`** after escaping, which is correct, but does not escape `\r`. Windows-style line endings would produce `\r<br>`, leaving stray `\r` in HTML. Normalize line endings in `stripTrailingNL`.
### 1.10 `Authors.hs`
**H-1.10.1 — HIGH — `allContent` omits directory-form essays.** Line 50:
```haskell
allContent = ("content/essays/*.md" .||. "content/blog/*.md") .&&. hasNoVersion
```
Compare to `Tags.hs:69`, which adds `"content/essays/*/index.md"`. Any essay stored as `content/essays/foo/index.md` will NOT appear on its author's index page. This is the most likely source of silent "why isn't this essay on my author page" bugs.
**L-1.10.2 — LOW — Duplicate of `Contexts.authorSlugify`.** `Authors.slugify` and `Contexts.authorSlugify` do the same thing with different definitions (the Contexts version normalizes before filtering, Authors version filters after lowercasing). The two will diverge on Unicode edge cases. Consolidate.
### 1.11 `Utils.hs`
**L-1.11.1 — LOW — `wordCount` counts HTML tokens as words.** Called from `Compilers.hs:172` on raw source `src` (Markdown, including any raw HTML) and from `Stats.hs:809` on tag-stripped HTML. On raw Markdown this miscounts `[display](url)` as three "words". Low-severity because the stat is approximate anyway, but worth noting when comparing `/stats/` numbers to `wc`.
### 1.12 `Pagination.hs`, `Tags.hs`, `SimilarLinks.hs`, `Metadata.hs`, `Main.hs`
No material issues. `Metadata.hs` is a two-line empty-module placeholder — delete or populate.
---
## 2. Pandoc filters (`build/Filters/*.hs`)
### 2.1 `Filters/Images.hs` — the big one
**C-2.1.1 — CRITICAL — `lowerExt` returns the basename, not the extension.** Line 110:
```haskell
lowerExt = map toLower . reverse . ('.' :) . takeWhile (/= '.') . tail . dropWhile (/= '.') . reverse
```
Trace for `"image.jpg"`:
1. `reverse``"gpj.egami"`
2. `dropWhile (/= '.')``".egami"`
3. `tail``"egami"`
4. `takeWhile (/= '.')``"egami"`
5. `('.' :)``".egami"`
6. `reverse``"image."`
7. `toLower``"image."`
So `lowerExt "image.jpg" == "image."` — which does not equal `.jpg`, `.jpeg`, `.png`, or `.gif`. **`isLocalRaster` is therefore `False` for every file**, the entire `<picture>`/WebP dispatch is dead code, and `tools/convert-images.sh` produces `.webp` companions that are never referenced.
Fix: `System.FilePath.takeExtension` is already imported elsewhere and already pulled in transitively; replace with
```haskell
lowerExt = map toLower . takeExtension
```
**M-2.1.2 — MEDIUM — `passedKvs` duplicate-emits `id`, `class`, `alt`, `title`.** Line 77:
```haskell
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox"]) kvs
```
But above, `attrId`, `attrClasses`, `attrAlt`, and `attrTitle` already emit those attributes from `(ident, classes, kvs)`. If an author writes `![alt](img.jpg){.foo title="bar"}`, Pandoc places `title` into `kvs`, so the output becomes `<img ... class="foo" title="bar" title="bar">`. Expand the blacklist:
```haskell
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox", "id", "class", "alt", "title"]) kvs
```
Side-note: the same issue affects the non-picture branch at line 47 indirectly (via the `Image` constructor Pandoc emits), but Pandoc's HTML writer handles dedup there.
**M-2.1.3 — MEDIUM — `stringify` catches most but not all inline variants.** Line 119-132: handles `Str`, `Space`, `SoftBreak`, `LineBreak`, `Emph`, `Strong`, `Code`, `Link`, `Image`, `Span`. Misses `Strikeout`, `Superscript`, `Subscript`, `SmallCaps`, `Quoted`, `Cite`, `Math`, `RawInline`. Alt text for an image captioned `~subscript~` will be empty.
**L-2.1.4 — LOW — `renderKvs` does not escape the key.** Line 94: `" " <> k <> "=\"" <> esc v <> "\""`. Keys in Pandoc come from Markdown attribute syntax and can only be identifiers, so this is safe in practice; but it's asymmetric with `v` and deserves either `esc k` or an assertion comment.
**L-2.1.5 — LOW — `isUrl` misses `data:`, fine; misses `file://`, OK; misses `mailto:` not relevant here.** Accurate for the intended domain.
### 2.2 `Filters/Transclusion.hs`
**M-2.2.1 — MEDIUM — `sec` attribute not HTML-escaped.** Line 41:
```haskell
Just (slugToUrl slug, " data-section=\"" ++ sec ++ "\"")
```
`sec` is everything after `#` up to `}}` in the Markdown source. If an author writes `{{essay#a"b}}`, the emitted HTML is `<div … data-section="a"b">` — invalid markup. Not a realistic XSS vector on a single-author static site (would be a self-attack), but:
- It's an injection primitive. The moment content ever comes from a PR, a collaborator, or an imported source, it becomes one.
- The fix is one line: escape `"`, `<`, `>`, `&` before interpolation.
**L-2.2.2 — LOW — `slugToUrl` appends `.html` unconditionally.** Line 46-49: `slug ++ ".html"`. If the slug is already `page.html`, you get `page.html.html`. Unlikely in practice (source convention is `{{essay-slug}}` with no extension), but guard against it.
**NIT — `trim` re-implemented yet again.** Same function appears at least four times (`Transclusion.hs:59`, `EmbedPdf.hs:80`, `Wikilinks.hs:59`, plus `Contexts.hs`'s `strip`). Factor.
### 2.3 `Filters/Score.hs`
**H-2.3.1 — HIGH — `TIO.readFile fullPath` with no existence check and no exception handling.** Line 40. A Markdown file that references a missing SVG aborts the entire Hakyll build with nothing more than:
```
openFile: does not exist (No such file or directory)
```
No filename, no page context, no recovery. Fix:
```haskell
existed <- doesFileExist fullPath
if not existed
then do putStrLn $ "[Score] missing: " ++ fullPath
return (Div ("", cls, attrs) blocks)
else do svgRaw <- TIO.readFile fullPath
...
```
Or wrap in `try` and fall back to an `errorBlock` mirroring `Filters.Viz.errorBlock`.
**M-2.3.2 — MEDIUM — Lazy-I/O `readFile` under `walkM`.** Using `Data.Text.IO.readFile` forces immediately, so this is actually OK — I retract the generic concern. The real issue is #H-2.3.1 above.
**L-2.3.3 — LOW — `processColors` is order-sensitive.** The comment on line 56-58 acknowledges it: the 6-digit hex replacements come *last* in the function composition chain, which means they're applied *first*. That's correct and the comment is helpful. Keep the comment.
**L-2.3.4 — LOW — `escHtml` reorder bug.** Line 88-92:
```haskell
escHtml = T.replace "\"" "&quot;"
. T.replace ">" "&gt;"
. T.replace "<" "&lt;"
. T.replace "&" "&amp;"
```
`&` must be replaced *first*, else the `&amp;` injected by other replacements gets its `&` replaced by `&amp;` to become `&amp;amp;`. Read bottom-up because of function composition: `&``<``>``"`. Wait — function composition: `f . g . h` applied to `x` is `f (g (h x))`. So the order executed is `&`, then `<`, then `>`, then `"`. **This is correct** (`&` first). Retracted — the `Viz.escHtml` at `Viz.hs:178-182` has the same composition order and is also correct. Nit only: write the function as a single chain with a comment stating the invariant.
### 2.4 `Filters/Viz.hs`
**H-2.4.1 — HIGH — No file-existence check before `readProcessWithExitCode`.** Line 96-99. Same class of bug as Score; the user sees `"non-zero exit"` with no path. Add `doesFileExist fullPath` before spawning.
**M-2.4.2 — MEDIUM — Exception handler drops the exception detail.** Line 99:
```haskell
`catch` (\e -> return (ExitFailure 1, "", show (e :: IOException)))
```
The third tuple element is set to `show e`, but then on line 102 the caller reads it as `err` and displays it. That's actually correct — retracted. BUT the error bubbles up to `errorBlock` which renders `<div class="viz-error">...</div>` inline in the page. That's actually graceful. Good.
**L-2.4.3 — LOW — `escScriptTag` only replaces `</`.** Line 133: correct for JSON embedding but not for content that contains `<!--` or `]]>` inside strings. Vega-Lite specs won't contain those, so fine.
**L-2.4.4 — LOW — `warn` uses `putStrLn` to stdout, not stderr.** Line 176. Mixes with Hakyll's build progress output. Use `hPutStrLn stderr`.
### 2.5 `Filters/Sidenotes.hs`
**H-2.5.1 — HIGH — Label wrap at 26 produces duplicate IDs.** Line 38:
```haskell
toLabel n = T.singleton (toEnum (fromEnum 'a' + (n - 1) `mod` 26))
```
Note 27 → `a` again. Two `<sup id="snref-a">` and two `<sup id="sn-a">` in the same document. Duplicate IDs are invalid HTML, break `href="#sn-a"` fragment navigation, and confuse ATs.
Fix options:
1. Use numeric labels: `"sn" ++ show n`.
2. Use two-letter labels for n > 26: `aa`, `ab`, …, `zz`.
3. Fail loudly with `error`: essays with >26 footnotes are rare and the user should know.
**M-2.5.2 — MEDIUM — `replacePTags` is a string-level hack.** Line 57-60:
```haskell
replacePTags =
T.replace "<p>" "<span class=\"sidenote-para\">"
. T.replace "</p>" "</span>"
```
A footnote whose content contains the literal text `<p>` (e.g., a code sample discussing HTML) will be mangled. Rare but possible. The correct fix is to transform the AST before writing, not the post-rendered HTML.
### 2.6 `Filters/Links.hs`
**M-2.6.1 — MEDIUM — `isExternal` uses substring match for the site domain.** Line 59:
```haskell
isExternal url =
("http://" `T.isPrefixOf` url || "https://" `T.isPrefixOf` url)
&& not ("levineuwirth.org" `T.isInfixOf` url)
```
`https://evil-levineuwirth.org.attacker.com/phish` contains `levineuwirth.org` as a substring → classified as *internal* → no `rel=noopener noreferrer target=_blank`. In 2026 with partitioned cookies this is mostly a cosmetic concern, but fix is trivial:
```haskell
isSameHost url =
case T.stripPrefix "https://" url <|> T.stripPrefix "http://" url of
Nothing -> False
Just rest ->
let host = T.takeWhile (\c -> c /= '/' && c /= ':') rest
in host == "levineuwirth.org" || "." `T.isSuffixOf` ("." <> host) -- etc.
```
or simpler: `host == "levineuwirth.org" || T.isSuffixOf ".levineuwirth.org" host`.
**M-2.6.2 — MEDIUM — PDF links with fragment are not rewritten.** Line 30-36 requires `.pdf" T.isSuffixOf` url` — a URL like `/papers/foo.pdf#page=5` has suffix `5`, not `.pdf`, so it doesn't route through the PDF.js viewer. Compare to `EmbedPdf.hs` which does handle fragments in the source preprocessor path. Inconsistent.
**L-2.6.3 — LOW — `domainIcon` duplicates twitter/x and youtube/youtu.be mappings.** Fine. Nit: table-driven via `lookup` would be cleaner than the chain of guards.
### 2.7 `Filters/Wikilinks.hs`
**M-2.7.1 — MEDIUM — `toMarkdownLink` does not escape `]` or `)`.** Line 33-36:
```haskell
toMarkdownLink inner =
let (title, display) = splitOnPipe inner
url = "/" ++ slugify title
in "[" ++ display ++ "](" ++ url ++ ")"
```
If the display text contains `]` or `)`, the generated Markdown is broken and Pandoc will parse it as raw text or as a weird link. Rare in practice (wikilink display is usually a plain name), but worth escaping.
**L-2.7.2 — LOW — `slugify` uses `intercalate "-" . words . ...` — "a.b" → "a b" → "a-b".** That's by design (punctuation becomes space becomes hyphen). Note the trailing hyphen for inputs like "end.": space after "end" → `["end"]` → "end". OK.
**NIT — Inefficient `trim``reverse . dropWhile ' ' . reverse . dropWhile ' '`.** Use `T.strip` if inputs were Text. `String`-based pipeline makes this unavoidable.
### 2.8 `Filters/EmbedPdf.hs`
**M-2.8.1 — MEDIUM — `encodeQueryValue` does not encode `#`.** Line 68-76: the encoder is called on `filePath`, which is already split on `#` by `parseDirective` (line 38). So the unencoded `#` issue doesn't bite here. However, the docstring at line 65 says "percent-encode characters that would break a query-string value" — `#` is such a character. Add it for defense in depth, even if the current call site doesn't benefit.
**L-2.8.2 — LOW — `parsePageHash` silently produces `""` for invalid fragments.** Line 45-51. An author writing `{{pdf:/foo.pdf#garbage}}` silently drops the fragment. No warning.
### 2.9 `Filters/Typography.hs`, `Filters/Code.hs`, `Filters/Smallcaps.hs`, `Filters/Dropcaps.hs`, `Filters/Math.hs`
Scanned via the parallel sub-audit; only nit-level findings apply (duplicate `escHtml`, smart-quote edge case in abbreviation matching, `apply = id` placeholders).
---
## 3. Static JavaScript (`static/js/*.js`)
Audited by parallel exploration. The full per-file list is long; the aggregate pattern is: **no user-authored content is ever injected**, so `innerHTML` usage across `popups.js`, `annotations.js`, `citations.js`, and `selection-popup.js` is **not an XSS vector under the current authoring model**. The risk profile changes the moment the site accepts PRs, gains an annotations-backend, or proxies third-party content (none of which are planned per `spec.md`).
### 3.1 XSS surface (all author-trust scoped)
**M-3.1.1 — MEDIUM — `popups.js:608-614` copies `innerHTML` from the page into the popup.** The `epistemicContent` provider does `html += '<div class="ep-compact">' + compact.innerHTML + '</div>'`. Because the source (`.ep-compact`) is emitted by our own Haskell code (`Contexts.hs` + templates), this is safe under the trust model. Switch to `compact.cloneNode(true)` + `popup.appendChild()` for a defense-in-depth fix that costs nothing.
**M-3.1.2 — MEDIUM — `popups.js` cross-origin fetches (Wikipedia, arXiv, CrossRef, GitHub, etc.) don't validate `Content-Type`.** A malicious CORS-enabled endpoint could return HTML that the popup would render. Every fetch already pipes through an `esc()` call (line 655-661), so the risk is bounded to text that escapes in some corner.
**L-3.1.3 — LOW — `citations.js:15, 56` and `annotations.js:167-172` use `innerHTML` with escaped data.** The escaping is correct; the fragility is that the escape-before-concat pattern is easy to get wrong in the future.
### 3.2 Event handling / lifecycles
**M-3.2.1 — MEDIUM — `sidenotes.js:73-94` attaches listeners per-sidenote with no cleanup path.** When `transclude.js` re-renders a fragment on resize, sidenotes accumulate duplicate handlers. Net effect: `update()` gets called 2×, 3×, … on hover over the same sidenote. Not a bug in the output, but a measurable leak over a long session.
**M-3.2.2 — MEDIUM — `popups.js` attaches listeners at load time and never re-binds for transcluded content.** A transcluded essay's internal links have no popup previews. If transclusion is meant to feel "live", this is a user-visible gap.
**M-3.2.3 — MEDIUM — `semantic-search.js:66-74` race in `loadModel`.** If two searches fire before the first model-load resolves, both call `import()` and `pipeline()`. Second call wastes CPU + memory. Track in-flight Promise:
```js
if (loadPromise) return loadPromise;
loadPromise = import(CDN).then(...);
```
### 3.3 Accessibility
**H-3.3.1 — HIGH — `gallery.js` overlay has no focus trap.** `openOverlay()` focuses the close button, but Tab escapes into the backdrop. Pattern to copy: `settings.js:35-49`.
**M-3.3.2 — MEDIUM — `selection-popup.js` annotation picker color swatches are mouse-only.** Arrow-key navigation + Enter to select would make it keyboard-accessible.
**M-3.3.3 — MEDIUM — `sidenotes.js` sidenote focus toggle is click-only.** No keyboard equivalent.
**L-3.3.4 — LOW — `lightbox.js:18,42` defaults `img.alt` to `""` and only later populates from source.** If source alt is missing, the lightbox image has no accessible name. Use `img.alt = srcAlt || 'Lightbox image'`.
**L-3.3.5 — LOW — `theme.js:9-28` does not `try/catch` around `localStorage.getItem`.** Private-browsing Safari throws. The code happens to work because `getItem` returns `null` on failure *in most browsers*, but not all.
### 3.4 Duplication and style
**L-3.4.1 — LOW — HTML escaping reimplemented 3× across `annotations.js`, `popups.js`, `semantic-search.js`.** Add a shared `utils.js` (one function).
**L-3.4.2 — LOW — Mixed `var` vs `const`/`let`.** `citations.js`, `nav.js`, `sidenotes.js`, `toc.js` use modern ES6+; `popups.js`, `annotations.js`, `gallery.js` use `var`. Pick one.
**NIT — Magic-number sprinkles** for delays (`SHOW_DELAY=250`, `HIDE_DELAY=150`, `SHOW_DELAY=450`, swipe threshold `30`, etc.). Not worth a refactor.
---
## 4. CSS and HTML templates
Audited by parallel exploration. Highlights:
### 4.1 CSS
**H-4.1.1 — HIGH — Undefined CSS custom properties.** `build.css` uses `--rule` (lines 21, 30, 39, 69) and `--bg-subtle` (components.css:1448) and `--font-ui` (many places) that have no definition in `base.css`. Browsers treat `var(--undefined)` as the initial value → silent visual degradation on the `/build/` and annotation-related pages.
Fix:
```css
:root {
--rule: var(--border-muted);
--font-ui: var(--font-sans);
--bg-subtle: #f5f5f5;
}
[data-theme="dark"] { --bg-subtle: #1f1f1f; }
```
**H-4.1.2 — HIGH — Dark-mode `--text-faint` contrast fails WCAG AA.** `#6a6660` on `#121212` ≈ 2.8:1. Used for sidenote numbers (0.65em!) and disabled-state icons. Bump to ~`#8b8680` (≈3.5:1) at minimum.
**H-4.1.3 — HIGH — TOC collapse hides content from keyboard + AT.** `components.css:433-436` uses `visibility: hidden` on collapsed TOC, which removes it from the accessibility tree. Use `aria-expanded` + height transition, or `aria-hidden="true"` explicitly, or `display: none` (losing the smooth collapse).
**H-4.1.4 — HIGH — No consistent `:focus-visible` ring across interactive elements.** `.nav-portal-toggle`, `.settings-toggle`, `.toc-toggle`, `.annotation-toggle` lack focus styles. Add a global:
```css
button:focus-visible, a:focus-visible {
outline: 2px solid var(--text);
outline-offset: 2px;
}
```
**M-4.1.5 — MEDIUM — Hardcoded hex in `print.css`.** `#fff`, `#000`, `#f9f9f9`, `#ddd` bypass variables. Move into a `@media print` `:root` overrides block.
**M-4.1.6 — MEDIUM — Breakpoints are scattered.** `540px`, `680px`, `900px`, `1100px`, `1500px` appear across files with no central definition. Define once in `base.css`:
```css
:root {
--bp-phone: 540px;
--bp-tablet: 680px;
--bp-desktop: 900px;
--bp-wide: 1500px;
}
```
(Note: CSS variables cannot be used inside `@media` queries; use Sass or a preprocessor, or settle for a comment + grep discipline.)
**L-4.1.7 — LOW — Inconsistent transition timings.** `0.15s`, `0.28s`, `0.3s`, `0.35s`, `0.5s` scattered. Three tokens would cover all cases.
**L-4.1.8 — LOW — Deprecated `font-variant` shorthand.** `reading.css:95` and `library.css:22` use `font-variant: small-caps`, which resets other OpenType features (like kerning). Use `font-variant-caps: small-caps`.
### 4.2 HTML templates
**M-4.2.1 — MEDIUM — `templates/default.html:30-33` inline onload script.** The KaTeX bootstrap is an inline onload attribute containing a multi-line JS expression. Works, but blocks any future strict CSP (`unsafe-inline`). Move to an external `katex-bootstrap.js` served from `/js/`.
**L-4.2.2 — LOW — `templates/partials/nav.html` buttons lack `type="button"`.** If any nav is ever placed inside a `<form>`, Enter will submit. Belt-and-suspenders fix: add `type="button"` to every `<button>` that isn't a submit.
**L-4.2.3 — LOW — `templates/partials/head.html` loads all component CSS unconditionally plus three conditional files.** Not a perf bug on HTTP/2, but `components.css` (1464 lines) is loaded even on the homepage. Split.
---
## 5. Python tooling (`tools/*.py`)
### 5.1 `tools/embed.py`
**H-5.1.1 — HIGH — Root URL becomes `"/./"`.** Line 68-73:
```python
def _url_from_path(html_path: Path) -> str:
rel = html_path.relative_to(SITE_DIR)
if rel.name == "index.html":
url = "/" + str(rel.parent) + "/"
return url.replace("//", "/")
return "/" + str(rel)
```
For `_site/index.html`, `rel.parent` is `Path(".")`. `str(Path("."))` is `"."` on Linux. Result: `"/./"`. Haskell's `SimilarLinks.normaliseUrl` produces `"/"` for the same route, so lookup fails and the homepage never gets similar-links suggestions.
Fix:
```python
if rel.name == "index.html":
parent = str(rel.parent)
if parent in (".", ""):
return "/"
return "/" + parent + "/"
```
**L-5.1.2 — LOW — No `--quiet` mode.** `embed.py` prints progress unconditionally; CI builds get noise.
**L-5.1.3 — LOW — `needs_update()` uses `rglob("*.html")` over `_site`.** Fine, but for a large `_site/` this re-stat's every HTML on every build. Could be cached via a single inode-level watermark file.
**NIT — `EXCLUDE_URLS` comparison against `/search/`, `/build/`, etc. works only because `_url_from_path` matches those exact forms.** A refactor could break the set. Document.
### 5.2 `tools/import-poetry.py`
**H-5.2.1 — HIGH — `yaml_str()` does not escape newlines.** Lines 193-203. An `abstract`, `attribution`, or first-line containing `\n` yields invalid YAML. Add `\n`, `\r` to the `needs_quote` character set.
**H-5.2.2 — HIGH — Empty `title_prefix` / `collection_slug` silently collide.** Line 328. If `--collection` is all punctuation, the slug becomes empty and every poem writes to the same path. Add an up-front assertion:
```python
if not collection_slug or collection_slug == "-":
sys.exit(f"error: collection slug is empty (check --collection={args.collection!r})")
```
**M-5.2.3 — MEDIUM — `--date` is unvalidated.** Line 313. User can pass `--date "last tuesday"` and it flows into YAML unchanged. Parse as `int` in `[1, 2100]`.
**M-5.2.4 — MEDIUM — `roman_to_int` has no explicit bounds check.** Line 45-52. Guarded by regex at the call site; fine today, but make the function defensive for its own protection.
**L-5.2.5 — LOW — `write_text(content, encoding="utf-8")` with no `errors=` argument.** Will raise on unmappable codepoints. Pick `errors="strict"` or `errors="replace"` intentionally.
### 5.3 `tools/viz_theme.py`
**L-5.3.1 — LOW — `save_svg` has no `try/finally` around `plt.close(fig)`.** If `savefig` raises, matplotlib state leaks. Standalone CLI-tool-y, so the impact is one figure.
### 5.4 `pyproject.toml` / `uv.lock`
**M-5.4.1 — MEDIUM — Upper-bound-free pins.** `torch>=2.5`, `sentence-transformers>=3.4`, `faiss-cpu>=1.9`, `numpy>=2.0`. A future major release can break the build silently. Pin with `<4` upper bounds.
---
## 6. Shell scripts and `Makefile`
### 6.1 `Makefile`
**M-6.1.1 — MEDIUM — `make deploy` pushes to GitHub *before* rsync.** Line 57-58:
```makefile
deploy: clean build sign
git push -u origin main
rsync -avz --delete _site/ $(VPS_USER)@$(VPS_HOST):$(VPS_PATH)/
```
If `rsync` fails, the GitHub push has already succeeded — the remote is ahead of the deployed site. Inverse order (rsync first, then push on success) would be safer, though `make` won't auto-rollback either way.
**M-6.1.2 — MEDIUM — `deploy` uses `$(VPS_USER)`, `$(VPS_HOST)`, `$(VPS_PATH)` with no definition in the Makefile.** They must come from `.env`. If any is unset, rsync runs as `@:/_site/` → silently opens an SSH connection to the wrong place or errors out obliquely. Add a guard:
```makefile
deploy: clean build sign
@test -n "$(VPS_USER)" || (echo "VPS_USER not set" >&2; exit 1)
@test -n "$(VPS_HOST)" || (echo "VPS_HOST not set" >&2; exit 1)
@test -n "$(VPS_PATH)" || (echo "VPS_PATH not set" >&2; exit 1)
...
```
**M-6.1.3 — MEDIUM — `build` commits content before building but never cleans up on failure.** Line 9-10:
```makefile
@git add content/
@git diff --cached --quiet || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ)"
```
If the subsequent `cabal run site -- build` fails, the commit is already in place. Subsequent `make build` retries will see a clean diff and succeed, masking the original failure in history as a "trailing" auto-commit. Low severity — the memory note about always `make clean && make build` + the `deploy: clean build` target make this infrequent.
**L-6.1.4 — LOW — `clean` only runs `cabal run site -- clean`.** That cleans `_site` and `_cache` (Hakyll's store), but not `dist-newstyle/` (cabal build output) or the embeddings under `data/` (gitignored but stale). Arguably correct-as-designed: deep clean is `git clean -fdX`. Document.
**L-6.1.5 — LOW — `pdf-thumbs` recipe uses unquoted `$$pdf` in `-nt` test.** If a PDF filename contains a space (rare), the test misparses. Quote:
```makefile
if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then
```
(The file path IS quoted — false alarm. Retracted.)
**L-6.1.6 — LOW — `export` on line 6 is blunt.** Every Make variable and every variable inherited from the shell becomes available to every recipe. For a solo build this is fine; be aware of scope creep.
**NIT — `build-start.txt` via file I/O instead of make variable.** Line 11, 23. A single recipe could use shell arithmetic, avoiding the scratch file (and the need to gitignore it).
### 6.2 `tools/sign-site.sh`
**L-6.2.1 — LOW — `find | xargs -I{}` with `-P $(nproc)` is vulnerable to pathological filenames.** The `-I{}` substitution plus `-0` is safe for spaces, but if `$(nproc)` returns `0` (cgroup edge cases), `-P 0` means "as many as possible", which is arguably fine but non-obvious. Explicit: `-P "${JOBS:-$(nproc)}"`.
**NIT — Hardcoded key fingerprint `C9A42A6FAD444FBE566FD738531BDC1CC2707066`.** Expected — but document how a key-rotation requires editing both this script and `preset-signing-passphrase.sh`.
### 6.3 `tools/convert-images.sh`
No issues of note. `set -euo pipefail`, `find -print0 | read -d ''`, quoting are all correct.
### 6.4 `tools/download-model.sh`
**L-6.4.1 — LOW — No checksum verification on downloaded ONNX.** Line 26 `curl -fsSL $BASE_URL/$src -o $dst`. If HuggingFace is compromised or returns a different build of the model, the site ships a trojan without warning. Pin expected SHA-256 and verify.
**NIT — Hardcoded HuggingFace URL.** Document that an official mirror is unavailable; that's why you pull from `resolve/main` rather than a pinned revision.
### 6.5 `tools/subset-fonts.sh`
**L-6.5.1 — LOW — Paths are Arch-specific.** `/usr/share/fonts/ttf-spectral`, `/usr/share/fonts/TTF`. On Debian/Ubuntu, JetBrains Mono lives at `/usr/share/fonts/truetype/jetbrains-mono/`. Detect via `fc-match` or document as Arch-only.
### 6.6 `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`
Clean. No material issues.
---
## 7. Repository hygiene and configuration
### 7.1 `.gitignore`
**L-7.1.1 — LOW — `.gitignore` lacks `*.swp`/`*.swo` for vim users, but has `*.swp`/`*.swo`.** OK. ✓
**L-7.1.2 — LOW — `dist-newstyle/`, `_site/`, `_cache/`, `.env`, `IGNORE.txt` all correctly ignored.** ✓
**NIT — `paper/` is tracked but its purpose is unclear.** Not in this audit's scope but worth a README.
### 7.2 Files in repo root that shouldn't be
- `BeyondComorbidityIndices.docx` (3.8 MB) + `BeyondComorbidityIndicesSupplement.docx` (1.6 MB) — untracked, but 5.4 MB of binary clutter at the project root. Move into `paper/` or `drafts/`.
- `HOMEPAGE.md~` — empty editor backup, gitignored but on disk.
- `HOMEPAGE.md`, `WRITING.md`, `migrate_html.md` — workspace notes without a home. Consider `docs/` or `notes/`.
- `content/modern_idolatry.md` — untracked Markdown file in `content/` that isn't `content/drafts/`. Either move under drafts or commit.
- `IGNORE.txt` — exists (empty), gitignored, used by the stability pin mechanism. Clean.
### 7.3 `levineuwirth.cabal`
**L-7.3.1 — LOW — `-Wno-unused-imports` masks real unused imports.** Set at the executable level. This hid the `Metadata.hs` no-op and the `Data.List.intercalate` in several modules. Delete the flag and fix the warnings.
**L-7.3.2 — LOW — Version bounds are present but `< 4.17` on Hakyll and `< 3.7` on Pandoc pin the project to a specific minor release window.** Good discipline, but document the refreeze cadence (there's `tools/refreeze.sh` — reference it in README).
**NIT — `bytestring < 0.13`** is ambitiously loose; the Pandoc ecosystem tends to follow `bytestring < 0.12`. Verify by running `cabal outdated --v2-freeze-file`.
### 7.4 `cabal.project`
`-O1` for the build program is the right call — Hakyll build time is dominated by Pandoc, not the wrapper. ✓
### 7.5 `pyproject.toml`
See finding M-5.4.1. Otherwise clean.
### 7.6 README
**M-7.6.1 — MEDIUM — `README.md` is a single line: `# levineuwirth.org`.** The project has a 63 KB `spec.md`, multiple build flows, optional features (.venv for embeddings, download-model for semantic search), a signing setup, and an rsync deployment target. None of this is documented. A new contributor (or future-you after a two-year hiatus) cannot get started from what's here.
Minimum viable README:
1. One-sentence description.
2. `make build`, `make dev`, `make deploy` entrypoints.
3. Optional: `.venv` setup via `uv sync` for embeddings; `make download-model` for client-side semantic search.
4. `.env` format (link to `.env.example`).
5. Pointer to `spec.md` for architecture.
---
## 8. Cross-cutting observations
### 8.1 Duplicate code
At least five independent implementations of HTML escaping (`Utils.hs`, `Images.hs`, `Score.hs`, `Smallcaps.hs`, `Viz.hs`, plus 3× in JS). At least four implementations of `trim` (`Transclusion`, `EmbedPdf`, `Wikilinks`, plus `Contexts.strip`). Two of `slugify`/`authorSlugify`. Two of `stringify` (`Compilers.hs`, `Images.hs`, plus `Text.Pandoc.Shared.stringify` in the library). Two `normaliseUrl` (`Backlinks.hs`, `SimilarLinks.hs`) — **almost** identical but with different `index.html` handling, so they cannot be naively merged.
Recommendation: create a `build/Common.hs` (or separate `build/Text.hs`) for `escapeHtml`, `trim`, `stringify`, and consolidate where possible.
### 8.2 Partial functions
Partial functions are used in several places with explicit guards (`last`, `head`, `!!`, `fromJust`). All are safe in their current guards, but the pattern is riskier than case-analysis. Audit: `Contexts.hs:263`, `Citations.hs:142`, `Stats.hs:125 (median)`, `Stability.hs:75`, `Catalog.hs:194`.
### 8.3 Error handling consistency
Two different patterns:
- `Score.hs` and older filters: `readFile` blows up, no diagnostic.
- `Viz.hs` and `Stability.hs`: `catch` + `errorBlock` / fallback.
Standardize on the second pattern across all IO-performing filters.
### 8.4 Trust boundary is unstated
The codebase leans on a "the author writes and reviews everything" assumption for:
- Frontmatter metadata (used raw in HTML by `Catalog.hs`, `Stats.hs`, `Contexts.hs`).
- Wikilink / transclusion slugs (used raw in HTML by `Transclusion.hs`).
- Bibliography entries (used raw in HTML by `Citations.hs`, `Backlinks.hs`).
This is a defensible design for a single-author site. Document it in `spec.md`. If the day ever comes that a PR from a collaborator is accepted, or that a user-provided input feeds any of these fields, the trust boundary needs to be revisited across all of these call sites simultaneously.
### 8.5 Build reproducibility
- Python dependencies are upper-bound-free (M-5.4.1).
- Model download (`tools/download-model.sh`) is unpinned by SHA (L-6.4.1).
- KaTeX and Vega are loaded from CDN (`templates/partials/head.html:26, 34-36`) without SRI hashes.
- Pandoc version is bounded (`>= 3.1 && < 3.7`), good but citeproc behavior varies subtly across these.
Add SRI to CDN assets; pin the ONNX model to a specific revision + SHA; tighten Python pins.
### 8.6 Accessibility posture
Strong foundation (skip link, ARIA on nav, semantic elements, reduce-motion support), with localized gaps:
- Gallery overlay focus trap (H-3.3.1)
- Collapsed TOC (H-4.1.3)
- Dark-mode text-faint contrast (H-4.1.2)
- Keyboard-only equivalents for sidenote + annotation picker interactions
Addressing H-3.3.1, H-4.1.3, and H-4.1.2 alone would raise the overall a11y grade meaningfully.
---
## 9. Fix priority (recommended order)
### P0 — correctness blockers
1. `Filters/Images.hs:110``lowerExt` bug. One-line fix (`takeExtension`). Restores the entire WebP pipeline.
2. `Commonplace.hs:126-131` — parenthesize the `if`. One line.
3. `tools/embed.py:68-73` — fix root URL. Three lines.
4. `Authors.hs:50` — add `content/essays/*/index.md` to `allContent`.
5. `Filters/Sidenotes.hs:38` — numeric labels (or error on >26).
### P1 — silent-failure hardening
6. `Filters/Score.hs:40` — missing file handling.
7. `Filters/Viz.hs:96` — missing file handling.
8. `Filters/Images.hs:77` — dedup `passedKvs` blacklist.
9. `Filters/Links.hs:59` — proper hostname match.
10. `tools/import-poetry.py:193` — escape newlines in YAML strings.
### P2 — accessibility
11. Dark-mode `--text-faint` contrast.
12. Gallery focus trap.
13. TOC collapsed-state keyboard access.
14. Global `:focus-visible` styles.
### P3 — hygiene and refactor
15. Missing CSS variables (`--rule`, `--font-ui`, `--bg-subtle`).
16. Consolidate duplicate `escapeHtml`/`trim`/`stringify`.
17. `README.md` with actual contents.
18. Delete `build/Metadata.hs` or populate.
19. Remove `-Wno-unused-imports` from `levineuwirth.cabal` and fix what surfaces.
20. Relocate `.docx` binaries out of repo root.
### P4 — nice to have
21. Reproducibility: SRI on CDN, pinned ONNX, tightened Python bounds.
22. Consolidate `Backlinks`/`Authors`/`Tags`/`Site` content patterns into a single `Patterns.hs`.
23. Defense-in-depth escaping in `Transclusion.hs`, `Catalog.hs`.
24. `make deploy` guard for `VPS_*` variables.
---
## Appendix A — files scanned in full
- **Haskell (build system):** `Main.hs`, `Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Authors.hs`, `Tags.hs`, `Pagination.hs`, `SimilarLinks.hs`, `Utils.hs`, `Metadata.hs`, `Filters.hs`.
- **Haskell (filters):** `Filters/Images.hs`, `Filters/Transclusion.hs`, `Filters/Score.hs`, `Filters/Viz.hs`, `Filters/Sidenotes.hs`, `Filters/Links.hs`, `Filters/Wikilinks.hs`, `Filters/EmbedPdf.hs`. Others via parallel audit.
- **JavaScript:** 20 files under `static/js/` via parallel audit (prism.min.js excluded as vendor).
- **CSS:** 22 files under `static/css/` via parallel audit.
- **Templates:** `default.html`, `partials/head.html`, `partials/nav.html`, plus the full template tree via parallel audit.
- **Python:** `tools/embed.py`, plus `tools/import-poetry.py`, `tools/viz_theme.py` via parallel audit.
- **Shell:** `tools/convert-images.sh`, `tools/sign-site.sh`, `tools/download-model.sh`, `tools/subset-fonts.sh`, `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`, `Makefile`.
- **Config:** `levineuwirth.cabal`, `cabal.project`, `pyproject.toml`, `.gitignore`, `.env.example`.
## Appendix B — what was not audited
- `templates/partials/metadata.html`, `footer.html`, `page-footer.html`, `paginate-nav.html` — inspected briefly via the CSS/template sub-audit only.
- `static/css/build.css` — cited by the CSS audit for undefined variable usage; rules not fully traced.
- `data/*.bib`, `data/*.csl` — treated as data, not audited for CSL correctness.
- `content/**/*.md` — authored content, out of scope.
- `_site/`, `_cache/`, `dist-newstyle/`, `.venv/` — build outputs.
- `spec.md` — design document, referenced but not audited line-by-line.
- `prism.min.js`, `pagefind` output, KaTeX, Vega — vendor / third-party.
— End of audit —

620
audit_implementation.md Normal file
View File

@ -0,0 +1,620 @@
# Audit Implementation Review — `audit-fixes` branch
**Reviewer:** Independent post-implementation review
**Date:** 2026-04-10
**Subject:** All uncommitted changes on the `audit-fixes` branch (working tree only — there are no commits ahead of `main` yet)
**Source of work being reviewed:** `audit.md` (independent code audit dated 2026-04-09)
**Method:** `git diff main` over every modified file, full read of new files, and a successful `cabal build` ("Up to date") to confirm no compile-breaking refactor was introduced.
This document answers two questions:
1. For each change introduced on `audit-fixes`, what audit finding (if any) does it address, and is the fix correct?
2. What regressions or net-negatives did the branch introduce?
The headline answer is: **the branch makes the codebase materially better and addresses the great majority of the audit's CRITICAL and HIGH findings correctly, but it has three real concerns.** First, the new `build/Patterns.hs` module — introduced specifically to close the H-1.10.1 "directory-form essays missed by Authors.hs" finding — is only adopted by three of the five modules that should consume it; `Stats.hs` and `Site.hs` still hold private patterns, so the same class of bug is partially perpetuated on the stats page. Second, an unrelated content file (`content/essays/modern_idolatry.md`) was moved out of the repo root into `content/essays/` instead of into `content/drafts/essays/`, which means it will publish to the live site on the next non-dev build despite its `status: "Draft"` frontmatter — this is an *introduced* risk that the audit did not warn about. Third, several "consolidation" refactors actually introduced new duplication (`percentDecode` is now byte-identical in two modules; `escAttr` is locally redefined in `Catalog.hs` and `Transclusion.hs` despite the new `Utils.escapeHtml`), and one fix in `Stats.hs` is annotated with a misleading "fixed" comment for code that was not actually changed.
The build compiles cleanly with the audit-fixes diff applied, so no refactor introduced an unbound name or type error.
---
## 1. Executive summary
### 1.1 Audit findings status
The audit identified ten CRITICAL/HIGH findings, ~22 MEDIUM, ~36 LOW, and a handful of NIT items. The branch's coverage:
| Severity | Total | Fixed correctly | Partial / risk | Not addressed |
|----------|-------|-----------------|----------------|---------------|
| CRITICAL | 1 | 1 | 0 | 0 |
| HIGH | 10 | 9 | 1 | 0 |
| MEDIUM | ~22 | ~14 | ~3 | ~5 |
| LOW | ~36 | ~12 | ~4 | ~20 |
| NIT | ~10 | ~3 | 0 | ~7 |
(Counts are approximate where the audit aggregates multiple sub-findings under one ID.)
Every CRITICAL and every HIGH except one is addressed. The single HIGH that is partially addressed is **H-1.10.1** (directory-form essays omitted by `Authors.hs`): Authors.hs is fixed, but the new canonical `Patterns.hs` module was not adopted by `Stats.hs`, so the writing-statistics page still under-counts the same essays.
### 1.2 Top-level verdict per area
| Area | Files touched | Net assessment | Headline reason |
|------|---------------|----------------|-----------------|
| Haskell core (`build/*.hs`) | 13 modified, 1 new, 1 deleted | **Net positive, with caveats** | Most fixes correct. Stats.hs blaze rewrite is high-quality but exceeds audit scope and skips Patterns.hs adoption. New `percentDecode` duplication. One misleading "fixed" comment. |
| Pandoc filters (`build/Filters/*.hs`) | 9 modified | **Net positive** | All HIGH/MEDIUM filter findings addressed correctly. Sidenote rewrite is the highest-risk change but verified. New local `escAttr` in Transclusion partially undercuts the consolidation goal. |
| JavaScript (`static/js/*.js`) | 9 modified, 2 new | **Net positive** | All scoped HIGH/MEDIUM findings addressed. Silent removal of TOC auto-collapse-on-scroll is an unannounced UX change. |
| CSS (`static/css/*.css`) | 5 modified | **Net positive** | Three HIGH a11y findings closed. Several declared design tokens (`--transition-medium`, `--bp-*`) are dead — defined but not consumed. |
| Templates | 3 modified | **Net positive** | KaTeX bootstrap externalized; `type="button"` added; new `utils.js` wired in correctly. |
| Tools / Makefile / config | 11 modified | **Net positive** | Deploy ordering fixed; `.env.example` documents the new vars; Python imports hardened; `download-model.sh` gains checksum verification. README expanded from one line to a usable document. |
| Content (essay move + new content) | 1 deleted, 1 new dir + figures, several untracked | **Mixed** | BCI essay rewrite is solid; figures and citations all resolve. **`content/essays/modern_idolatry.md` will accidentally publish** because it's matched by `publishedEssays`. |
### 1.3 The three concerns to take action on
1. **Partial adoption of `build/Patterns.hs`.** `Stats.hs` (line 747 and 901) and `Site.hs` (`publishedEssays`/`draftEssays`) still maintain their own essay patterns instead of importing from `Patterns`. The audit's L-1.1.2 finding said "extract one canonical `Patterns.hs`" — it was extracted, but two of five candidate consumers still bypass it. The H-1.10.1 fix (directory-form essays in author indexes) is therefore incomplete on the build/stats page. **Recommendation:** Switch the two `loadAll` calls in `Stats.hs:747,901` to use `Patterns.essayPattern`, and replace `Site.hs`'s `publishedEssays` with `Patterns.essayPattern`.
2. **`content/essays/modern_idolatry.md` will publish on the next build.** The Hakyll publication gate is **path-based** (`content/essays/**/*.md` is matched by `publishedEssays` in `build/Site.hs:23-24`), not frontmatter-based. The file's `status: "Draft"` frontmatter is metadata only — it does not exclude the file from the build. **Recommendation:** Move to `content/drafts/essays/modern_idolatry.md` (the directory does not yet exist on disk and will need to be created) before any non-dev build.
3. **Misleading "fixed" comment in `Stats.hs` for L-1.3.4.** A new comment claims that the Backlinks-decode round-trip was eliminated, but the underlying `Aeson.decodeStrict (TE.encodeUtf8 (T.pack rawBL))` code is byte-identical to `main`. **Recommendation:** Remove the comment or actually load the JSON as `ByteString` from a custom compiler.
The rest of this document walks through the changes file-by-file, in five sections.
---
## 2. Haskell core build modules
### 2.1 New: `build/Patterns.hs`
A clean new module that exports `essayPattern`, `blogPattern`, `poetryPattern`, `fictionPattern`, `musicPattern`, `standalonePagesPattern`, and three aggregations (`allWritings`, `allContent`, `authorIndexable`, `tagIndexable`). `essayPattern` correctly includes both `content/essays/*.md` and `content/essays/*/index.md`. `poetryPattern` correctly excludes collection `index.md` files via `complement`. `authorIndexable` and `tagIndexable` apply `hasNoVersion` so the "links" version produced by `Backlinks.hs` is not double-indexed.
- **Audit findings addressed:** L-1.1.2 (pattern centralization). Indirectly enables H-1.10.1 to be closed in modules that adopt it.
- **Verdict:** The module is the right abstraction and is cleanly implemented. The problem is partial adoption (see 2.4 below).
### 2.2 Deleted: `build/Metadata.hs`
The empty placeholder module is deleted; the cabal `other-modules` entry is removed in the same diff. Confirmed via `cabal build` that no remaining file imports it (the 30+ grep hits for "Metadata" all reference Hakyll's `Hakyll.Core.Metadata` module/type, not the deleted local one).
- **Audit findings addressed:** Hygiene line item.
- **Verdict:** Clean deletion.
### 2.3 `build/Authors.hs`
Drops the local `authorLinksField` (relocated to `Contexts.hs`), uses `Patterns.authorIndexable` in place of the hand-rolled `allContent`, delegates `slugify`/`nameOf` to `Utils.authorSlugify`/`authorNameOf`, and adds `abstractField` to the per-item context.
- **Audit findings addressed:** **H-1.10.1** (directory-form essays now indexed on author pages), **L-1.10.2** (slugify deduplicated), **L-1.1.2** (shares the new canonical pattern).
- **Risks:** Adding `abstractField` to the author item context is a minor template contract change — author-page templates that previously had no `$abstract$` will now receive one. Worth a quick template spot-check.
- **Verdict:** Net positive. Closes the highest-impact author-page bug correctly.
### 2.4 `build/Stats.hs` (the largest single diff: ~720 lines)
This is the most ambitious change in the branch. It rewrites the HTML-generating helpers to use **`blaze-html`** throughout, introduces a defense-in-depth `isSafeUrl`/`safeHref`/`link`/`pageLink` URL allowlist, replaces the naive `stripHtmlTags` with a small state machine (handling tag bodies, comments, CDATA, quoted attribute values), adds `pathIsSymbolicLink` skipping to `walkDir`, switches `countLinesDir` to strict `TIO.readFile`, replaces a partial `s !! (length s div 2)` `median` with a total pattern match, and aliases `renderStatsTags = renderTagsSection` to collapse the duplicate. Two new cabal dependencies (`blaze-html`, `blaze-markup`) accompany this change.
- **Audit findings addressed:** **M-1.3.1** (naive `stripHtmlTags`), **M-1.3.2** (no symlink protection), **L-1.3.3** (protocol-relative URL allowlist hole), **L-1.3.5** (duplicate tag rendering function), **L-1.3.6** (lazy `readFile`).
- **Real concerns:**
1. **`Stats.hs` did NOT adopt `Patterns.hs`.** Lines 747 and 901 still call `loadAll ("content/essays/*.md" .&&. hasNoVersion)`, which means the writing-statistics page still under-counts directory-form essays. Verified by direct grep on the current file. This perpetuates the exact class of bug that **H-1.10.1** was meant to close.
2. **L-1.3.4 has a misleading "fixed" comment.** A new comment describes "decoding directly from the encoded UTF-8 bytes [to] avoid the previous String → Text → ByteString round-trip", but the underlying code is byte-identical to `main` (`Aeson.decodeStrict (TE.encodeUtf8 (T.pack rawBL))`). Either the comment should be removed or the JSON should be loaded as `ByteString` from a custom compiler.
3. **Scope creep.** The audit only asked for a smarter `stripHtmlTags`. The blaze rewrite is a substantial quality improvement (HTML escaping is now structural rather than manual) and it does subsume several other findings, but it triples the line count of the affected functions and adds two cabal dependencies. This is a defensible call but it expanded the review surface considerably and any regression in the rendered `/build/` and `/stats/` pages will live in this code.
4. **Cosmetic dedup of tag function.** `renderStatsTags = renderTagsSection` is two names pointing to the same body — the surface area is unchanged. The audit recommended deleting one caller, not aliasing them.
- **Other observations:** The new heatmap CSS classes (`.hm0`...`.hm4`, `.hm-lbl`) were moved out of an inline `<style>` block into `static/css/build.css` — verified that the corresponding rules are present in `build.css:125-140`. The state machine in the new `stripHtmlTags` correctly handles comments and CDATA but does not strip `<script>`/`<style>` content — not a problem because blaze never emits raw script/style on this page.
- **Verdict:** **Mixed.** The blaze rewrite is a real engineering improvement, but the failure to adopt `Patterns.hs`, the misleading comment, and the surface area expansion mean this file should be the focus of any post-merge regression check. The `_site/build/` and `_site/stats/` pages should be visually compared to a known-good build before deploy.
### 2.5 `build/Contexts.hs`
Re-homes `tagLinksField` (from `Tags.hs`) and `authorLinksField` (from `Authors.hs`); adds `abstractField` to the exports; switches author-related helpers to `Utils.authorSlugify`/`authorNameOf`. **Filters out empty author entries** in `authorLinksField` (closing M-1.2.1). **Splits `parseMovements` into `parseMovementsWithWarnings`** that warns via `unsafeCompiler`/`hPutStrLn stderr` for any malformed movement entry, with a thin `parseMovements = fst . parseMovementsWithWarnings` compatibility wrapper (closing M-1.2.2). **Replaces partial `xs !! (length xs - 2)` and `last xs`** in `confidenceTrendField` with a total `lastTwo` helper, and factors out the magic `5` as a named `trendThreshold` constant (closing L-1.2.4 and the partial-function variant of L-1.2.3).
- **Audit findings addressed:** M-1.2.1, M-1.2.2, L-1.2.4. The partial-function refactor incidentally improves M-1.2.3.
- **Risks:** L-1.2.3 (`abstractField` only strips single-`Para` abstracts) is **not addressed** — the same pattern match is still present. L-1.2.5 (`pageScriptsField` collision risk on shared script paths) is **not addressed**. The relocation of `tagLinksField` to `Contexts.hs` introduces a new latent drift axis: the new copy hard-codes `fromFilePath (t ++ "/index.html")` instead of going through `Tags.tagFilePath`. If `tagFilePath` ever changes, the Contexts copy will silently diverge. Worth a `-- keep in sync with Tags.tagFilePath` comment.
- **Verdict:** Net positive on the in-scope items, with two LOW items left on the table.
LN: we will address the remaining two items.
### 2.6 `build/Stability.hs`
`readIgnore` switches to strict `TIO.readFile`. `gitDates` now captures and logs `stderr` to `hPutStrLn stderr` on both success (non-empty stderr) and failure paths. `stabilityFromDates` replaces partial `head dates`/`last dates` with a total pattern match using `reverse` to find the oldest commit. The classification thresholds (e.g., `n <= 5 && age < 90` → "revising") are extracted as named constants with comments.
- **Audit findings addressed:** **M-1.7.1** (lazy `readFile`), **L-1.7.3** (stderr logging). The threshold-as-constants refactor closes the implicit complaint about magic numbers.
- **Not addressed:** L-1.7.2 (`unsafeCompiler` for git breaks Hakyll dep tracking) — explicitly deferred; meaningful remediation would require tracking `.git/HEAD` in Hakyll's dep graph, which is beyond the audit's scope.
- **Verdict:** Net positive.
### 2.7 `build/Catalog.hs`
Adds local `safeHref`, `escAttr`, and `escText` helpers, and applies them to `ceUrl`, `ceYear`, `ceDuration`, `ceInstrumentation`, and `categoryLabel` inside `renderEntry`/`renderCategorySection`. Replaces the partial `head g` in `renderCategorySection` with a total pattern match `renderGroup (e : _) = ...`. `ceTitle` is still emitted unescaped by design, with a trust-model comment added.
- **Audit findings addressed:** **M-1.8.1** (frontmatter escaping with documented trust caveat).
- **Risks:** **`safeHref`, `escAttr`, and `escText` are local re-implementations** of helpers that the audit explicitly asked to consolidate. `safeHref` is now the **third** copy of the URL allowlist (also in `Stats.isSafeUrl`); `escAttr` is essentially a copy of the new `Utils.escapeHtml`/`escapeHtmlText`. The branch added centralized helpers in `Utils.hs` and then immediately bypassed them here. The fix is correct in isolation but contradicts the consolidation goal.
- **Verdict:** Positive for the security fix, mixed on the duplication.
LN: let's try to bring this to an entire net positive.
### 2.8 `build/Backlinks.hs` and `build/SimilarLinks.hs`
Both files add a new `percentDecode` function that decodes `%XX` escapes into raw bytes and reinterprets them as UTF-8 (with lenient decoding) and call it from their respective `normaliseUrl` functions. `Backlinks.hs` additionally switches its local `allContent` to `Patterns.allContent`.
- **Audit findings addressed:** **L-1.4.1** (URL-decode in `normaliseUrl`); the `Patterns.allContent` adoption closes part of L-1.1.2.
- **Risks:** **`percentDecode` is byte-for-byte duplicated** between `Backlinks.hs` and `SimilarLinks.hs`. The in-diff justification is that the two modules apply different *pre-normalisation* steps, which is true, but the decoder function itself is identical. This should live in `Utils.percentDecode`. The audit was explicit about exactly this kind of drift; the fix introduces a new instance of it. Additionally, `Stats.hs`'s own `normUrl` does NOT percent-decode, so if a route ever contained a percent-encoded character, the orphan-link counts on `/stats/` would silently disagree with `Backlinks.hs`. In practice Hakyll routes are ASCII so this doesn't bite, but it's a latent asymmetry.
- **Verdict:** Net positive (the fix is correct and improves consistency between Backlinks and SimilarLinks); the missed factoring is a paper-cut, not a regression.
### 2.9 `build/Citations.hs`
`transformInline` replaces partial `head keys`/`head nums` with a total pattern match `(firstKey : _, firstNum : _)`, falling through to `Str ""` otherwise.
- **Audit findings addressed:** L-1.5.1.
- **Verdict:** Net positive. Minimal, correct, semantics-preserving.
### 2.10 `build/Commonplace.hs`
The H-1.9.1 operator-precedence bug: parentheses added around the `if`-expression so the closing `</div>` is always emitted in `renderChronoView`. Two characters of fix.
- **Audit findings addressed:** H-1.9.1.
- **Verdict:** Net positive. Trivial, correct.
### 2.11 `build/Compilers.hs`
Removes the now-redundant `import Hakyll.Core.Metadata (lookupStringList, lookupString)` since `Hakyll` (the umbrella module) re-exports both.
- **Verdict:** Net positive. Pure housekeeping.
### 2.12 `build/Tags.hs`
Drops the local `tagLinksField` export (relocated to Contexts), drops the local `allContent` in favor of `Patterns.tagIndexable`, adds `abstractField` to `tagItemCtx`, and removes the unused `Pagination (pageSize)` import.
- **Audit findings addressed:** L-1.1.2 (pattern centralization on the tags side).
- **Risks:** As noted above, the `tagLinksField` relocation creates a latent drift axis with `Tags.tagFilePath`.
- **Verdict:** Net positive.
### 2.13 `build/Site.hs`
Adds **draft-essay support**: a new `SITE_ENV=dev` env-var gate (read once at rule-registration via `preprocess $ lookupEnv "SITE_ENV"`) that, when set to `"dev"`, includes `content/drafts/essays/**.md` in the `allEssays` pattern and routes them to `drafts/essays/...`. New rules also handle co-located JS and static assets under `content/drafts/essays/`. The deleted `Control.Monad (intercalate)` import is replaced with `Aeson.encode` for the `random-pages.json` builder, which now produces valid JSON via Aeson rather than a hand-rolled `intercalate ","` (a real correctness improvement, though not in the audit).
- **Audit findings addressed:** None directly. This is a **new feature** unrelated to the audit (draft mode), and one collateral correctness improvement (the random-pages JSON).
- **Risks:**
1. **Site.hs does NOT use `Patterns.hs`.** The new `publishedEssays` and `draftEssays` definitions are in `Site.hs`, which means there are now *two* sources of truth for the essay pattern: `Site.hs.publishedEssays` and `Patterns.essayPattern`. They happen to be string-equivalent today, but if either is edited the other will silently drift. Recommend `import qualified Patterns as P` and `let publishedEssays = P.essayPattern`.
2. **Scope creep.** Draft mode is a new feature, not an audit fix. It's implemented cleanly (env-var gated, no cross-module filtering needed because every existing pattern only matches `content/essays/**`), but it adds complexity to the rules registration that the audit did not anticipate.
3. The `random-pages.json` JSON-encoder fix is a quiet but real correctness improvement: the previous `intercalate "," . map show` was technically invalid for any URL containing a backslash or non-ASCII character. Worth a commit-message callout.
- **Verdict:** Net positive. The feature is valuable, the JSON fix is a real (silent) bug fix, and the draft-mode design is sound. The `Patterns.hs` non-adoption is the only smell.
### 2.14 `build/Utils.hs`
Adds `escapeHtmlText` (a `Text` variant of the existing `escapeHtml`), `trim`, `authorSlugify`, and `authorNameOf`.
- **Audit findings addressed:** L-1.10.2 (centralized author slugify), enabling several downstream consolidation fixes.
- **Risks:** L-1.11.1 (`wordCount` counts HTML tokens as words) is **not addressed** — the function is unchanged. The new `escapeHtml` comment notes that "ordering matters" for the replacements but the implementation is `concatMap escChar`, which is character-by-character — the order does NOT matter for this implementation pattern (only for sequential `T.replace`). Misleading but harmless.
- **Verdict:** Net positive. Small, focused, makes downstream consolidation possible — even though several callers immediately ignored the new helpers and rolled their own.
---
## 3. Pandoc filters
All filter changes are summarized below; the full per-finding verification was performed and is condensed for readability.
### 3.1 `Filters/Images.hs` (closes the CRITICAL)
- **C-2.1.1 fixed correctly.** `lowerExt` is now `map toLower . takeExtension`. Edge cases verified: extensionless files (`"Makefile"`) → `""` (no raster); dotfiles (`".hidden"`) → `""`; double-extension (`"foo.tar.gz"`) → `".gz"`; trailing dot (`"foo."`) → `"."` (skipped). The entire `<picture>`/WebP pipeline is now live for the first time since whenever the regression was introduced. **This is the single most impactful fix in the branch.**
- **M-2.1.2 fixed correctly.** `passedKvs` blocklist expanded to include `id`, `class`, `alt`, `title`, `src` in addition to `loading`/`data-lightbox`. One subtle behavior change: an author who previously wrote `![](foo.jpg){src="bar.jpg"}` would have gotten a duplicate `src` attribute on the rendered `<img>`; now the user-supplied `src` is silently dropped. This is an improvement (the Pandoc Target is the canonical source), but it is a behavioral change worth noting.
- **M-2.1.3 fixed correctly.** Local `stringify` expanded to cover `Strikeout`, `Superscript`, `Subscript`, `SmallCaps`, `Underline`, `Quoted`, `Cite`, `Math`, `RawInline`, `Note`. `Math` returns the raw math source (e.g., `x^2`), which is uglier than `[math]` but better than empty.
- **L-2.1.4 not addressed.** `renderKvs` still emits the attribute key without escaping. Defensive only; in practice unreachable since Pandoc kv keys are alphanumeric identifiers.
- **Verdict:** Net positive. The `lowerExt` fix alone justifies this file's diff.
### 3.2 `Filters/Score.hs` (H-2.3.1)
`doesFileExist` guard plus `try :: IO (Either IOException T.Text)` catch around `TIO.readFile`. On missing file or read error, logs the path to stderr and returns an `errorBlock` (`<figure class="score-fragment score-fragment--error">`) instead of crashing the build. Local `escHtml` delegates to `Utils.escapeHtmlText`.
- **Verdict:** Net positive. Turns a build-aborting crash into a visible diagnostic. Note: `score-fragment--error` is a new CSS class with no corresponding rule yet — the figure will render unstyled, which is arguably the intent.
LN: for now, that is the intent indeed.
### 3.3 `Filters/Viz.hs` (H-2.4.1, L-2.4.4)
Adds `doesFileExist` check before `readProcessWithExitCode`. Enriches error messages to prefix the source path. Switches `warn` from `putStrLn` to `hPutStrLn stderr`. Local `escHtml` delegates to `Utils.escapeHtmlText`.
- **Verdict:** Net positive. Tiny TOCTOU gap between `doesFileExist` and the subprocess call is irrelevant for a static site build.
### 3.4 `Filters/Sidenotes.hs` (H-2.5.1, M-2.5.2 — highest-risk filter change)
- **H-2.5.1 fixed correctly.** `toLabel` is rewritten from `(n - 1) mod 26` to a base-26 expansion: `1`→`a`, `26`→`z`, `27`→`aa`, `702`→`zz`, `703`→`aaa`. Verified by hand: `n=27``(26 divMod 26) = (1,0)` → recurse on `k=1``(0,0)``"a"`, append `'a'``"aa"`. No collisions, guaranteed unique. JS in `static/js/sidenotes.js` derives `snref-N` from `id.slice(3)`, which works for any label length — no JS-side regression.
- **M-2.5.2 fixed correctly.** The string-level `T.replace "<p>"` hack is replaced with an AST-level `blocksToInlineHtml` that renders each `Para` via Pandoc's HTML writer with a wrapping `<span class="sidenote-para">`. Multi-paragraph footnotes containing the literal text `<p>` (e.g., a code sample about HTML) are no longer mangled.
- **Risks:** Multi-paragraph sidenotes with non-`Para` block content (lists, blockquotes, code blocks) fall through to a `blocksToHtml [b]` path that emits block-level `<p>`/`<ul>` etc. inside a `<span>` — technically invalid HTML but unlikely to appear in practice. `inlinesToHtml` silently returns `T.empty` on Pandoc-writer error (should warn). Three-letter labels (`aaa`+) at >702 footnotes may overflow `.sidenote-num` CSS layout; authors with such prolific footnoting will hit this before the audit's correctness fix matters.
- **Verdict:** Net positive. Both fixes are structurally correct. Worth a spot-check on a page with a multi-paragraph or list-containing footnote post-deploy.
### 3.5 `Filters/Transclusion.hs` (M-2.2.1, L-2.2.2)
HTML-escapes both `url` and `sec` via a new local `escAttr` before interpolation into `data-src`/`data-section`. `slugToUrl` becomes idempotent for slugs already ending in `.html`.
- **Risks:** **`escAttr` is locally redefined** despite the new `Utils.escapeHtml`. This module works on `String`, and `Utils.escapeHtml` also works on `String`, so there is no type-mismatch excuse. A clean miss of the consolidation goal.
- **Verdict:** Net positive on the security/idempotency fixes; minor regression on the duplication front.
### 3.6 `Filters/Links.hs` (M-2.6.1, M-2.6.2)
`isExternal` rewritten to extract hostname properly: strip `http(s)://`, take up to first `/?#`, drop `:port`, lowercase, then exact-match `levineuwirth.org` or `.levineuwirth.org` suffix. Verified test cases: `https://evil-levineuwirth.org.attacker.com/``external` (correct, fixes the audit finding); `https://www.levineuwirth.org/``internal`; `https://LEVINEUWIRTH.ORG/``internal`. PDF links with fragments now split on `#`, encode only the path through `encodeQueryValue`, and re-attach the fragment to the viewer URL.
- **Minor issue:** `extractHost` does not handle URLs with userinfo (`https://user:pass@host/`) — `host` would be parsed as `"user"`. No realistic content uses credentials in URLs; non-blocking.
- **Verdict:** Net positive. The hostname parsing is the security-relevant fix and it's correct.
### 3.7 `Filters/Wikilinks.hs` (M-2.7.1)
Adds `escMdLinkText` (escapes `\`, `[`, `]` in display text) and `encodeUrl` (percent-encodes `(`, `)`, space). Switches local `trim` to `Utils.trim`.
- **Risks:** `encodeUrl` is essentially **dead code**: the URL it processes is `"/" ++ slugify title`, and `slugify` only outputs `[a-z0-9-]`, none of which need encoding. Defensive without payoff. **L-2.7.2 (slugify trailing-period quirk) is not addressed.** Switching to `Utils.trim` is also a minor semantics drift: the old local `trim` only stripped ASCII space, the new one strips all whitespace via `isSpace` (so tabs in wikilinks are now trimmed where they were preserved before). Almost certainly fine.
- **Verdict:** Mixed. Display-text escaping is correct; URL encoding is over-engineered; LOW finding silently skipped.
LN: let's comprehensively revisit this.
### 3.8 `Filters/EmbedPdf.hs` (M-2.8.1)
Adds `#` to the `encodeQueryValue` encode table; switches local `trim` to `Utils.trim`.
- **Verdict:** Net positive. Small, correct, defense-in-depth.
### 3.9 `Filters/Smallcaps.hs`
Local `escHtml` delegates to `Utils.escapeHtmlText`. Pure cleanup; no behavior change.
- **Verdict:** Net positive.
### 3.10 Filter cross-cutting observations
**`escapeHtml` consolidation is partial.** `Utils.escapeHtmlText` now serves `Images.hs`, `Smallcaps.hs`, `Score.hs`, `Viz.hs`. But `Filters/Typography.hs` still has its own local `escHtml` (untouched), `Filters/Transclusion.hs` introduces a *new* local `escAttr` instead of using `Utils.escapeHtml`, and `Catalog.hs` defines its own `escText`/`escAttr`. **Net result: 4 files consolidated, 3 files (Typography, Transclusion, Catalog) still have duplicates.** The audit's NIT about duplicate escape helpers was addressed by half.
**`trim` is also fragmented.** `Utils.trim` is added and used by `Wikilinks`, `Transclusion`, `EmbedPdf`. But `Contexts.hs` still uses Hakyll's re-exported `Hakyll.Core.Util.String.trim` via the umbrella `import Hakyll`. Two trim functions in active use, with equivalent semantics — no break, but the consolidation goal is incomplete.
**No new partial functions introduced.** All new IO code in Score/Viz uses `try`/`catch` or `doesFileExist` guards. No bare `head`, `tail`, `!!`, or `fromJust` added in any filter diff.
**No type-signature changes ripple to callers.** Module interfaces (`apply`, `inlineScores`, `inlineViz`) are unchanged. `Site.hs`, `Compilers.hs`, and the Filters umbrella module (`build/Filters.hs`) are unaffected.
LN: all should be addressed, but we can make the other fixes in my comments here first, then discuss this in more depth to get it right.
---
## 4. JavaScript, CSS, and templates
### 4.1 New: `static/js/utils.js`
Single-function module exposing `window.lnUtils.escapeHtml(s)` (escapes `&<>"'`, with `'` newly added relative to all three previous duplicates). Wrapped in an IIFE that guards against double-assignment. Loaded synchronously from `templates/partials/head.html:31` *before* `theme.js` and before every defer'd consumer, so `window.lnUtils` is guaranteed to exist by the time `annotations.js`, `popups.js`, `semantic-search.js` run.
- **Audit findings addressed:** L-3.4.1.
- **Verdict:** Net positive. The shared helper is strictly safer than the previous three copies (it escapes single quotes, which the old ones did not).
### 4.2 New: `static/js/katex-bootstrap.js`
Externalizes the inline `onload="(function(){...})()"` KaTeX render loop that used to live on `templates/default.html`. Adds a `try/catch` around `katex.render` and a `typeof katex === 'undefined'` early-out that the inline version lacked. Loaded with `defer` after the KaTeX CDN script, also `defer`ed — defer scripts execute in document order, so KaTeX is guaranteed to be defined before the bootstrap runs.
- **Audit findings addressed:** M-4.2.1 (CSP compatibility).
- **Risks:** **One behavioral change:** the new bootstrap renders both `<span class="math">` and `<div class="math">`; the old inline script only rendered `SPAN`. Pandoc's default math output is `<span class="math display">` so this is unlikely to bite, but display-math edge cases should be spot-checked in a build.
- **Verdict:** Net positive.
### 4.3 `static/js/gallery.js` (H-3.3.1)
Adds a `Tab` branch in the existing overlay keydown listener that calls a new `trapTab(e)` cycling focus through `button:not([disabled]), [tabindex]:not([tabindex="-1"])` inside `#gallery-overlay`. Mirrors the `settings.js:33-49` pattern and additionally snaps focus back into the overlay if `document.activeElement` is outside it entirely. Listener is guarded by `overlay.hasAttribute('hidden')` so the trap is inert when the overlay is closed.
- **Verdict:** Net positive. A minor doc-drift: a comment refers to "(currently inert) page background" but the overlay does not actually set `inert` or `aria-hidden` on `document.body`, so a screen reader could still navigate the page beneath the overlay in virtual-cursor mode. Cosmetic.
### 4.4 `static/js/popups.js` (M-3.1.1, M-3.1.2, M-3.2.2, L-3.4.1)
Five distinct fixes: (1) new `window.reinitPopups(container)` for transcluded content; (2) `bind()` is now idempotent via `el.dataset.popupBound === '1'`; (3) `scheduleShow` accepts either a string or a `Node` from providers; (4) `epistemicContent` returns a `<div>` built from `cloneNode(true)` of `.ep-compact`/`.ep-expanded` instead of concatenating `innerHTML`; (5) new `fetchJson`/`fetchXml` helpers validate `Content-Type` before parsing; all nine cross-origin fetches (Wikipedia, arXiv, CrossRef, GitHub, Forgejo, OpenLibrary, bioRxiv, YouTube, archive, PubMed) routed through them. `esc()` delegates to `window.lnUtils.escapeHtml`.
- **Verdict:** Net positive. The `Content-Type` matchers are sound: the JSON regex matches `application/json`, `application/ld+json`, and `application/vnd.github.v3+json`; the XML matcher accepts `application/atom+xml`. The idempotent `bind()` guard means transcluded content can be re-initialized without handler accumulation.
### 4.5 `static/js/sidenotes.js` (M-3.2.1)
Adds an idempotent guard in `wireHover` using `ref.dataset.snBound`. Extracts `wireAll(root)` from `init` and exposes `window.reinitSidenotes(container)`.
- **Risks:** **M-3.3.3 (sidenote focus toggle is click-only) is not addressed.** No keyboard handler is added to toggle the `.is-focused` class. The audit listed this as MEDIUM; it remains open.
- **Verdict:** Net positive on M-3.2.1 only.
LN: we need to resolve this MEDIUM open problem.
### 4.6 `static/js/semantic-search.js` (M-3.2.3, L-3.4.1)
Adds `loadModelPromise` in-flight cache for the model-loading `import(CDN)` chain; resets the cache on failure so retries work. Classic double-checked-lock pattern for JS promises. `esc()` delegates to `window.lnUtils.escapeHtml`.
- **Verdict:** Net positive.
### 4.7 `static/js/annotations.js`, `static/js/lightbox.js`, `static/js/theme.js`
- `annotations.js`: `escHtml` delegates to `window.lnUtils.escapeHtml`. (L-3.4.1.)
- `lightbox.js`: `img.alt` initial value becomes `'Lightbox image'`; `open()` uses `alt || captionText || 'Lightbox image'` fallback chain. (L-3.3.4.)
- `theme.js`: New `safeGet(key)` wraps `localStorage.getItem` in try/catch for Safari private mode. (L-3.3.5.)
- **Verdict:** All net positive. The matching `setItem` writes performed elsewhere (`settings.js`) are not wrapped — minor inconsistency, not in scope here.
LN: let's bring them up to consistency.
### 4.8 `static/js/toc.js` (H-4.1.3 + silent feature removal)
`setExpanded(open)` now sets `aria-hidden="true|false"` on the TOC nav and toggles `tabindex="-1"` on every link, working in concert with the components.css change that drops `visibility: hidden` from `#toc.is-collapsed .toc-nav`. **Removes the entire `autoCollapsed`/`collapseOnce` dead-code path** (and its two call sites), so the TOC no longer auto-collapses on the first scroll.
- **Risks:** **The auto-collapse-on-first-scroll behavior is silently removed.** Users will now see the full TOC expanded throughout the read unless they manually collapse it. This is a real UX change, not flagged in the audit, and not commented anywhere in the diff. It should be explicitly called out in the commit message and ideally validated by Levi.
- **Verdict:** Mixed. The a11y fix is clean; the UX removal is unannounced.
### 4.9 `static/js/transclude.js` (M-3.2.1, M-3.2.2 follow-through)
`reinitFragment(container)` now calls `window.reinitSidenotes(container)` and `window.reinitPopups(container)` when present, with a fallback to the old `resize` event dispatch for sidenotes.
- **Verdict:** Net positive. Works in tandem with the public hooks added to `sidenotes.js` and `popups.js`.
### 4.10 `templates/default.html`, `templates/partials/head.html`, `templates/partials/nav.html`
- `default.html`: inline KaTeX `onload="..."` removed, replaced by two `defer` scripts (KaTeX CDN, then `/js/katex-bootstrap.js`). (M-4.2.1.)
- `head.html`: adds `<script src="/js/utils.js"></script>` synchronously before `theme.js`. (L-3.4.1.) **L-4.2.3 (unconditional CSS loading) is not addressed** — every component CSS file still loads on every page.
- `nav.html`: every `<button>` now has `type="button"` (11 buttons total). (L-4.2.2.)
- **Verdict:** Net positive on all three.
### 4.11 `static/css/base.css` (H-4.1.1, H-4.1.2, H-4.1.4)
(1) Adds `--transition-medium: 0.28s ease` and `--transition-slow: 0.5s ease` design tokens. (2) Defines `--rule`, `--font-ui`, `--bg-subtle` as aliases of `--border-muted`/`--font-sans`/`--bg-offset` (closing H-4.1.1). (3) Defines `--bp-phone`/`--bp-tablet`/`--bp-desktop`/`--bp-wide` as documentation tokens. (4) Bumps dark-mode `--text-faint` from `#6a6660` to `#8b8680` in two locations (closing H-4.1.2; new contrast ratio is ~3.92:1 against `#121212`, which clears 3:1 for non-text UI elements). (5) Adds global `:focus-visible` ring rules covering `button`, `a`, `summary`, `[role="button"]`, `input`, `select`, `textarea` (closing H-4.1.4).
- **Risks:** `--transition-medium`, `--transition-slow`, `--bp-phone`, `--bp-tablet`, `--bp-desktop`, `--bp-wide` are **defined but never consumed anywhere in the codebase**. They are documentation placeholders. The audit findings L-4.1.7 (inconsistent transitions) and M-4.1.6 (scattered breakpoints) are **not materially addressed** — the actual `transition:` and `@media` call sites were not migrated. The work was started but not finished.
- **Verdict:** Net positive. Three HIGH a11y findings closed correctly; dead tokens are scope creep that should either be removed or completed by migrating call sites.
### 4.12 `static/css/build.css`, `static/css/components.css`, `static/css/print.css`, `static/css/typography.css`
- `build.css`: adds heatmap fill rules `.heatmap-svg .hm0..hm4` pointing at `var(--hm-0..--hm-4)` and `.heatmap-svg .hm-lbl` styling. **This is needed because Stats.hs moved the previously-inline SVG `<style>` block into external CSS.** Verified the corresponding tokens exist in both light and dark mode.
- `components.css`: `#toc.is-collapsed .toc-nav` drops `visibility: hidden` and gains `pointer-events: none` on collapsed links as belt-and-suspenders. (H-4.1.3.)
- `print.css`: replaces hardcoded `#fff`/`#000`/etc. with `var(--bg)`/`var(--text)`/`var(--bg-subtle)`/`var(--border-muted)`/`var(--text-faint)`, and adds a `@media print` `:root,[data-theme="dark"]` block that forces a light palette. (M-4.1.5.)
- `typography.css`: adds `figure:has(> img) { display: table }` for shrink-wrapped image captions; changes figcaption font-size from `var(--text-size-small)` to `0.92em`. **Scope creep** — neither change maps to an audit finding.
- **Verdict:** All net positive. The typography figcaption font-size change is a visual regression for anyone who had tuned `--text-size-small` expecting it to apply to captions; minor.
---
## 5. Tools, Makefile, and configuration
### 5.1 `Makefile` (M-6.1.1, M-6.1.2)
(1) Adds `test -n "$(VPS_USER)" || exit 1` guards for `VPS_USER`/`VPS_HOST`/`VPS_PATH` in the deploy recipe. (2) Reorders deploy: rsync now runs **before** `git push -u origin main`, so a failed rsync leaves the GitHub mirror still pointing at the last successful deploy. (3) Adds target-specific `SITE_ENV=dev` exports for `watch` and `dev` (which feed the new `Site.hs` draft mode). (4) Adds explanatory comments above the content/ auto-commit and the dev gate.
- **Risks:** Because `deploy: clean build sign` runs *before* the VPS guards fire, a missing `.env` costs a full clean build before the failure surfaces. Cosmetic. **M-6.1.3 (build commits content/ before building, never cleans up on failure) is acknowledged in a comment but not actually fixed.**
- **Verdict:** Net positive.
### 5.2 `.env.example`
Adds explicit `VPS_USER`/`VPS_HOST`/`VPS_PATH` keys with header/section comments matching the new Makefile guards. Pure documentation.
- **Verdict:** Net positive.
### 5.3 `README.md` (M-7.6.1)
From a one-line stub to a ~79-line user-facing README covering quickstart commands, optional features (embeddings, semantic-search model, image conversion, PDF thumbnails), `.env` configuration, repository layout, and architecture pointers. Cross-references `build/Patterns.hs`, `build/Site.hs`, `build/Compilers.hs`, `build/Filters/Images.hs`, `tools/convert-images.sh`, and `spec.md`.
- **Verdict:** Net positive. One mild caveat: the README advertises `make dev` as the day-to-day command, but `dev` doesn't re-run `convert-images.sh` or `pdf-thumbs` like `build` does, so an author adding a JPEG won't get WebP companions until they `make build`. Worth a future clarification.
LN: Address in the README that make build should proceed make dev when figures, etc. have changed.
### 5.4 `levineuwirth.cabal`
(1) Adds `Patterns` and `Filters.EmbedPdf` to `other-modules`. (2) Removes `Metadata`. (3) Adds `blaze-html >= 0.9 && < 0.10` and `blaze-markup >= 0.8 && < 0.9` to `build-depends` (required by the Stats.hs blaze rewrite). (4) Drops `-Wno-unused-imports` from `ghc-options`.
- **Verification:** `cabal build` reports "Up to date" — confirms no compile-breaking refactor. The `-Wno-unused-imports` removal is non-regressive: cabal uses `-Wall` only (no `-Werror`), so unused imports surface as warnings, not errors. Levi will see them during `cabal build` and can clean them up incrementally.
- **Verdict:** Net positive.
### 5.5 `cabal.project.freeze`
Two patch-level pin bumps: `OneTuple 0.4.2 → 0.4.2.1`, `text-short 0.1.6 → 0.1.6.1`. Both transitive deps. Low-risk.
- **Verdict:** Net positive.
LN: you can ignore any such patch bumps and do not worry about dependencies. If there are ever issues with dependencies as we continue to iteration on the audit, just run the tools/refreeze.sh and they will be solved.
### 5.6 `pyproject.toml` (M-5.4.1)
Adds upper bounds to every dependency pin: `matplotlib<4`, `altair<6`, `sentence-transformers<4`, `faiss-cpu<2`, `numpy<3`, `beautifulsoup4<5`, `torch<3`. Adds a rationale comment.
- **Verdict:** Net positive.
### 5.7 `tools/embed.py` (H-5.1.1)
`_url_from_path` now explicitly handles the root `index.html` case: if the parent is `.` or `""`, return `"/"`; otherwise return `"/" + parent + "/"`. This was the audit's HIGH about `SimilarLinks.hs` never matching the homepage.
- **Risks:** L-5.1.2 (no `--quiet` mode) and L-5.1.3 (re-stats every HTML on every run) are not addressed. Both are LOW.
- **Verdict:** Net positive.
### 5.8 `tools/import-poetry.py` (H-5.2.1, H-5.2.2, M-5.2.3, M-5.2.4)
(1) `roman_to_int` bails out on empty string and adds an inner `i < len(s)` bounds check. (2) `yaml_str` adds `\n`, `\r`, `\t` to the needs-quoting trigger set and explicitly escapes them in the output. (3) `main()` validates `args.date` as an integer in `[1, 2100]`. (4) `main()` asserts `title_prefix.strip()` is nonempty. (5) `main()` asserts `collection_slug` is nonempty and not just `"-"`.
- **Risks:** L-5.2.5 (`write_text` no `errors=` kwarg) is not addressed.
LN: let's address this
- **Verdict:** Net positive.
### 5.9 `tools/sign-site.sh`
Replaces a sequential `while read | gpg` loop with `find ... -print0 | xargs -0 -I {} -P $(nproc) gpg ...`, parallelizing signing across CPU cores.
- **Risks:** Under `set -euo pipefail`, a single `gpg` failure aborts the script via `xargs` exit code 123, but several other sign operations may have already started — partial signing state is left on disk, where the previous sequential implementation stopped immediately. The post-loop `count = find ... | wc -c` counts HTML files, not signatures actually written, so the reported count is misleading after a partial failure. Acceptable for a sign step (rerun fixes it), but a behavioral change worth a comment.
- **Verdict:** Net positive (real perf win, minor rough edge).
### 5.10 `tools/download-model.sh` (L-6.4.1)
Adds a supply-chain SHA-256 verification layer. New `expected_sha()` and `verify_sha()` helpers look up a relative path in `tools/model-checksums.sha256` (if present) and compare `sha256sum` output. `fetch()` calls `verify_sha` both on skip (file already present) and after successful download. On mismatch the file is deleted and the script exits 1. If the checksum file is absent, a one-line advisory is printed and downloads proceed unverified.
- **Risks:** The checksum file (`tools/model-checksums.sha256`) does **not** yet exist in the repo, so the first run stays advisory-only. The audit asked for the mechanism, not the pinned values — Levi will need to generate and commit the checksums once he has verified a model version out-of-band. The script's own comment block documents that workflow.
- **Verdict:** Net positive. Mechanism in place; pinning still pending.
### 5.11 `tools/convert-images.sh`
Staleness check upgraded from "skip if .webp exists" to "skip if .webp exists **and** the source is not newer than the webp" (`! "$img" -nt "$webp"`). Previously, an edited source silently served a stale webp.
- **Verdict:** Net positive.
### 5.12 Tools cross-cutting observations
- `tools/__pycache__/*.pyc` show as modified in git because they were apparently committed at some point and the source edits changed their hashes. They should be in `.gitignore` and removed from tracking — separate hygiene follow-up, not introduced by this branch.
- The cabal `other-modules` list is now internally consistent: `Metadata` removed, `Patterns` and `Filters.EmbedPdf` added, all referenced files exist on disk.
- `cabal build` succeeds, confirming no broken refactor.
LN: we can go ahead and make that gitignore change involving *.pyc
---
## 6. Content changes
### 6.1 Beyond Comorbidity Indices essay (move + rewrite)
The file moves from `content/essays/beyond-comorbidity-indices.md` (216 lines, deleted) to `content/essays/beyond-comorbidity-indices/index.md` (~368 lines, new). This is a substantial rewrite, not a mechanical reformat. Routing is supported by `build/Site.hs:23-24` (`publishedEssays` matches both `*.md` and `*/index.md`) and `build/Site.hs:215-224` correctly maps `content/essays/slug/index.md → essays/slug/index.html`.
**What's preserved:** YAML frontmatter (title, authors, affiliations, Icarian metadata, tags), Key Points callout, dropcap intro, all Pandoc fenced divs for figures, the structure of Tables 1-2, the core scientific claims (AUC values, DeLong tests, IG interpretability).
**What's new or changed:**
- Cohort numbers updated and now internally consistent: "over 113 million" decomposes as `80,217,696 + 33,322,761`.
- Date bumped from `2026-03-28` to `2026-04-09`.
- Two new frontmatter fields: `bibliography: data/bci-paper.bib` and `repository: "https://git.levineuwirth.org/neuwirth/beyond_comorbidity_indices"`. Verify the `repository` Icarian context field is rendered somewhere if the author intends it visible — otherwise it's metadata-only.
- Numeric updates to ECI mortality AUC (0.6686 → 0.6414) and CCI mortality AUC (0.7217 → 0.7621), reflecting a re-fit.
- Methods/Results/Discussion prose substantially expanded; full Supplement (eMethods 1-4, eTables 1-3, eFigures 1-4) inlined.
LN: we are going to introduce a site-wide means of supplement and appendices, but not yet.
- New Code Availability + Conflict-of-Interest + Data Sharing sections added.
- The old version's "Structured Abstract" collapsible callout is removed; equivalent content survives in the manuscript body.
**Figures:** 10 PNG files exist in `content/essays/beyond-comorbidity-indices/figures/` and all 10 are referenced by the essay text (`fig2a`, `fig2b`, `fig3a`, `fig3b`, `fig4a`, `fig4b`, `efig1`, `efig2`, `efig4a`, `efig4b`). They will route to `/essays/beyond-comorbidity-indices/figures/*.png` via the `content/essays/**` static-asset rule at `build/Site.hs:233-235`. **Two placeholders remain:** Figure 1 ("Flow chart of discharge records") and eFigure 3 ("Calibration reliability plots") are wrapped in `annotation--static` divs with `[placeholder]` text — no image file. These should be resolved before deploy.
**Citations:** 38 unique citation keys extracted from the essay; 38 `@entry{...}` blocks in `data/bci-paper.bib` (new file). The sets are **identical** — every citation resolves, no dead keys, no unused entries. The frontmatter `bibliography: data/bci-paper.bib` correctly overrides the default `data/bibliography.bib` per `build/Compilers.hs:144`.
- **Verdict:** Net positive. Major rewrite, all figures and citations resolve, infrastructure supports the directory layout. Two placeholder figures and the new `repository` frontmatter field need attention before deploy.
LN: what needs to be done about repository frontmatter? Please discuss with me.
### 6.2 `content/colophon.md`
Three polished prose paragraphs are replaced with slightly more casual versions. Closing line changes from "git history functions as an authoritative record" to "git repository on Forgejo... should always be considered to take precedence." A dropcap paragraph about tools being "chosen rather than accepted" is removed. The Hyprland-on-both-machines paragraph is removed. The Emacs paragraph is expanded to announce a "Pmacs" side project for Summer 2026.
- **Verdict:** **Mixed.** The original prose was tighter and more distinctive ("every tool I use was chosen rather than accepted. This distinction matters..."). The new prose sacrifices precision in places ("I am, like many passionate nerds within the realm of computing, obsessive over my technological choices") and announces "Pmacs" without context. If the intent is voice recalibration toward less formal, it works; otherwise it reads like a rough draft compared to `main`. No broken references.
LN: the colophon, like all else on this website, is iterative. It is not intended to be a dissertation, but rather informal reading for the curious surfer. I am continuing to revise it iteratively, but don't worry about this change.
### 6.3 `content/essays/modern_idolatry.md` (untracked, **CRITICAL**)
This file's frontmatter declares `status: "Draft"`, but **the Hakyll publication gate is path-based, not frontmatter-based.** See `build/Site.hs:23-24`:
```haskell
publishedEssays = "content/essays/*.md" .||. "content/essays/*/index.md"
draftEssays = "content/drafts/essays/*.md" .||. "content/drafts/essays/*/index.md"
```
Since the file lives at `content/essays/modern_idolatry.md`, it matches `publishedEssays`. The moment this file is committed and `make deploy` (or `make build`) runs in non-dev mode, it will publish to the live site regardless of its `status` frontmatter. The audit's hygiene note that "`content/modern_idolatry.md` was at the project root" was *partially* addressed: the file moved under `content/essays/`, which is better organizationally, but it now also matches the `publishedEssays` glob — this is a *worse* state than the original location.
**Action required:** Move the file to `content/drafts/essays/modern_idolatry.md`. Note that `content/drafts/essays/` does not currently exist on disk and would need to be created. Alternatively, if the essay is in fact ready, flip `status` to a published value and verify via `make dev` first.
LN: It should be moved to /content/drafts/essays, which can be created.
- **Verdict:** **Negative as currently staged.** This is the single most concerning issue introduced by the branch.
### 6.4 Workspace files (`audit.md`, `migrate_html.md`, `paper/*.docx`)
- **`paper/BeyondComorbidityIndices.docx` and `paper/BeyondComorbidityIndicesSupplement.docx`** — confirmed moved out of the project root into `paper/` (5.4 MB, untracked). The audit's hygiene recommendation is satisfied. However, `paper/` also contains LaTeX build artifacts (`main.aux`, `main.log`, multiple `pgftest*.{aux,log,pdf}`) that should probably be `.gitignore`'d — separate concern.
LN: We should git ignore LaTeX build artifacts sitewide. This is a change to make.
- **`audit.md`** at the repo root, untracked, not gitignored. Hakyll will not pick it up (its only matching rule is `content/*.md`, not repo-root `*.md`), so the build is safe. But the file is in limbo: a casual `git add .` could accidentally commit it.
- **`migrate_html.md`** — same situation as `audit.md`.
LN: the .md files will be removed after everything in this branch is done and we merge back into the main branch. They're temporary as we work.
- **Recommendation:** Either gitignore both workspace docs (`audit.md`, `migrate_html.md`, `paper/*.docx`, `paper/*.aux`, `paper/*.log`, `paper/pgftest*.*`) or move them into a tracked `docs/` folder.
- **Verdict:** Mixed. Docx move is correct; workspace docs need a decision.
LN: the docx was a temporary artifact that I will remove once the rewrite is entirely complete; this can be ignored.
### 6.5 Link integrity
`grep -r beyond-comorbidity-indices` across the full repo returns only two hits: the new essay itself and the `bci-paper.bib` comment header. **No templates, partials, Haskell sources, or other content files reference the old `content/essays/beyond-comorbidity-indices.md` flat path.** No stale internal links from the move.
---
## 7. Scope creep (changes outside the audit)
The following changes appear on the `audit-fixes` branch but were not requested by `audit.md`. They are not necessarily bad, but each represents an expansion of the review surface area:
1. **`build/Stats.hs` blaze-html rewrite.** The audit asked for a smarter `stripHtmlTags`. The branch delivers a full conversion of the HTML-generation paths to `blaze-html`, plus two new cabal dependencies. Defensible quality improvement; tripled the line count of affected functions.
2. **`build/Site.hs` draft-essay mode.** A new `SITE_ENV=dev` env-var gate that includes `content/drafts/essays/**` in dev builds. Clean implementation, but it's a new feature unrelated to any audit finding.
LN: this is a new feature that I added to give me a space to pull pieces from scratch ideation to public facing. It should stay, and was intentional.
3. **`build/Site.hs` random-pages.json fix.** The previous `intercalate "," . map show` was technically invalid for any URL containing a backslash or non-ASCII character; the new `Aeson.encode` is correct. Quiet but real correctness improvement.
LN: this is good!
4. **`static/css/typography.css` figure layout.** Adds `figure:has(> img) { display: table }` for shrink-wrapped image captions; changes figcaption font-size from `var(--text-size-small)` to `0.92em`.
LN: we are still debugging some things related to the caption font sizes, so feel free to ignore this.
5. **`static/css/build.css` heatmap rules.** New CSS classes `.hm0..hm4`, `.hm-lbl` to support the moved-out-of-Stats.hs heatmap. Required by the Stats.hs change, not in itself a scope-creep item.
6. **`static/css/base.css` design tokens.** `--transition-medium`, `--transition-slow`, `--bp-phone`, `--bp-tablet`, `--bp-desktop`, `--bp-wide` are defined but **never consumed**. Started L-4.1.7 and M-4.1.6 work without finishing.
7. **`static/js/toc.js` auto-collapse removal.** The `autoCollapsed`/`collapseOnce` dead-code path is deleted. This is a real UX change (TOC no longer auto-collapses on first scroll) and is not flagged anywhere.
LN: see below; this was intentional by me.
8. **New `bibliography: data/bci-paper.bib` and `repository:` frontmatter** on the BCI essay — supports the rewrite but adds metadata fields the templates may not yet render.
---
## 8. Audit findings NOT addressed by this branch
These items are listed in `audit.md` but are not closed by the diff. Most are LOW; the listed MEDIUMs are explicitly punts.
**Haskell core:**
- L-1.1.1: blog posts still flat-only (no `content/blog/*/index.md` form).
- L-1.1.4: `Site.hs` `library.html` still calls `loadAll` 32 times for 8 portals.
- L-1.2.3: `abstractField` still only strips single-`Para` abstracts.
- L-1.2.5: `pageScriptsField` still uses script path as Hakyll item identifier (collision risk).
- L-1.7.2: `Stability.hs` `unsafeCompiler` still breaks Hakyll dep tracking on git HEAD changes.
- L-1.11.1: `Utils.wordCount` still counts HTML tokens as words.
- L-1.3.4: misleading "fixed" comment for the `String → Text → ByteString` round-trip in Stats.hs (see section 2.4).
- **H-1.10.1 partially**: directory-form essays now appear on author pages (Authors.hs fix), but `Stats.hs` still uses raw `content/essays/*.md` patterns — the writing-statistics page still under-counts them.
**Filters:**
- L-2.1.4: `Images.renderKvs` still does not escape attribute keys.
- L-2.7.2: `Wikilinks.slugify` trailing-period quirk (`"end."` → `"end"`).
- L-2.8.2: `EmbedPdf.parsePageHash` silent empty return.
- NIT (`Filters/Typography.hs`): duplicate `escHtml` still local.
**JavaScript / CSS:**
- M-3.3.2: `selection-popup.js` annotation picker swatches still mouse-only.
- M-3.3.3: `sidenotes.js` sidenote focus toggle still click-only.
- L-3.4.2: mixed `var` vs `const`/`let` across JS files (no mass conversion).
- L-4.1.7: transition timings — token added but no call sites migrated.
- M-4.1.6: breakpoint tokens — added but `@media` queries not migrated. (CSS `@media` cannot consume custom properties anyway; partial fix is the structural ceiling.)
- L-4.1.8: `font-variant: small-caps` shorthand still in `reading.css`/`library.css`.
- L-4.2.3: `head.html` still loads all component CSS unconditionally.
**Tools / Makefile:**
- L-5.1.2: `embed.py` no `--quiet` flag.
- L-5.1.3: `embed.py` `needs_update()` still re-stats every HTML.
- L-5.2.5: `import-poetry.py` `write_text` no `errors=` kwarg.
- L-6.1.4: `make clean` still doesn't touch `dist-newstyle/` or stale embeddings.
- M-6.1.3: build-failure cleanup not implemented (acknowledged in a comment as intentional).
---
## 9. New code-duplication introduced by the branch
The audit's section 8.1 ("Duplicate code") explicitly called out 5+ implementations of `escHtml`, 4+ of `trim`, 2 of `slugify`, 2 of `stringify`, 2 of `normaliseUrl`. The branch added consolidation helpers in `Utils.hs` (`escapeHtmlText`, `trim`, `authorSlugify`, `authorNameOf`) but several modules immediately re-introduced their own copies:
| Function | Where it now lives | Where it should be | Status |
|----------|---------------------|--------------------|--------|
| `escapeHtmlText` (Text variant) | `Utils.hs` | — | Used by Images, Smallcaps, Score, Viz ✓ |
| `escHtml` (local Text variant) | `Filters/Typography.hs` | `Utils.escapeHtmlText` | **Untouched duplicate** |
| `escAttr` (local String variant) | `Filters/Transclusion.hs` | `Utils.escapeHtml` | **New duplicate introduced** |
| `escText`/`escAttr` (local String variants) | `Catalog.hs` | `Utils.escapeHtml`/`escapeHtmlText` | **New duplicates introduced** |
| `safeHref` / URL allowlist | `Stats.hs` and `Catalog.hs` | `Utils.isSafeHref` (does not exist) | **Now duplicated** |
| `percentDecode` | `Backlinks.hs` and `SimilarLinks.hs` | `Utils.percentDecode` (does not exist) | **New duplication, byte-identical** |
| `trim` | `Utils.hs` | — | Used by Wikilinks, Transclusion, EmbedPdf ✓ |
| `trim` (Hakyll re-export) | `Contexts.hs` | `Utils.trim` | **Untouched alternate import** |
| `authorSlugify`, `authorNameOf` | `Utils.hs` | — | Used by Authors, Contexts ✓ |
| `stringify` (Pandoc inline) | `Filters/Images.hs` (expanded) | `Utils.stringify` (does not exist) | **Still local; expanded but not factored** |
**Net:** the branch consolidated `escHtml` for 4 files but introduced 3 new local duplicates (Transclusion, Catalog ×2). It centralized `slugify`/`nameOf` correctly. It added `trim` to Utils but did not fully migrate Contexts. It introduced two byte-identical copies of `percentDecode` and a third instance of the URL allowlist pattern. The duplication footprint of the codebase is roughly unchanged in net terms — different functions, same total count.
---
## 10. Build status
**`cabal build` reports "Up to date"** — meaning every Haskell module compiles successfully with the audit-fixes diff applied. There are no broken module references, no unbound names, no type mismatches. The deletion of `Metadata.hs` is consistent with both the cabal file and all Haskell source. The new `Patterns.hs` is consistent with the cabal file and is imported correctly by `Authors.hs`, `Backlinks.hs`, and `Tags.hs`. The new `blaze-html`/`blaze-markup` cabal entries are consistent with the actual imports in `Stats.hs`. The `-Wno-unused-imports` removal does not produce any new errors because the cabal file has no `-Werror`.
This does not validate the *behavior* of any change — only that the program is well-formed and links. The Stats.hs blaze rewrite, the Sidenotes AST refactor, and the Stats.hs symlink-aware `walkDir` should all be exercised against a real `_site/` build before deploy.
---
## 11. Recommended actions before merge / deploy
**Must-fix:**
1. **Move `content/essays/modern_idolatry.md` to `content/drafts/essays/modern_idolatry.md`** (creating the directory if needed). Otherwise it will publish on the next non-dev build despite its `status: "Draft"` frontmatter.
2. **Either resolve or remove the placeholder text** in Figure 1 and eFigure 3 of the BCI essay before deploy.
**Should-fix:**
3. **Adopt `Patterns.hs` in `Stats.hs` and `Site.hs`.** Replace the raw `content/essays/*.md` patterns at `build/Stats.hs:747,901` with `Patterns.essayPattern`, and replace `Site.hs`'s `publishedEssays` with `Patterns.essayPattern`. Otherwise H-1.10.1 is only partially closed and the H-1.1.2 drift the audit warned about persists.
4. **Remove or fix the misleading "L-1.3.4 fixed" comment in `Stats.hs`.** The code is unchanged from `main`; the comment is false.
5. **Document the silent removal of TOC auto-collapse-on-scroll** in the commit message, or restore it. The behavior change should be intentional and visible.
LN: this was a change that I made based on user feedback. It should stay.
6. **Decide what to do with `audit.md`, `migrate_html.md`, and `paper/*.docx`/`paper/*.aux`/`paper/*.log`.** Either gitignore them or move them into a tracked `docs/` folder. Currently they're in working-tree limbo.
**Nice-to-have:**
7. Factor `percentDecode` into `Utils.percentDecode` (called by both Backlinks and SimilarLinks). Factor `safeHref`/`isSafeUrl` into `Utils.isSafeHref` (called by Stats and Catalog). Replace local `escAttr` in Transclusion and Catalog with `Utils.escapeHtml`.
8. Either consume the new `--transition-medium`/`--transition-slow`/`--bp-*` tokens in `base.css` or remove them. As-is they're documentation placeholders that imply a refactor that hasn't happened.
9. Generate and commit `tools/model-checksums.sha256` so `download-model.sh` actually verifies, not just warns.
10. Migrate `Contexts.hs`'s implicit `Hakyll.trim` import to `Utils.trim` to complete the trim consolidation.
11. Add `tools/__pycache__/` and `paper/main.{aux,log,blg,out}`, `paper/pgftest*.*` to `.gitignore` and `git rm --cached` the existing `.pyc` entries.
12. If the `repository:` frontmatter on the BCI essay is intended to render somewhere on the page, verify the template emits it; otherwise it's metadata-only.
---
## 12. Final assessment
**Is every change a net positive?** No. Three concerns rise above the noise:
- The accidental-publish risk on `content/essays/modern_idolatry.md` is a regression introduced by this branch (the file was at the repo root in `main`-state and would not have been published; it is now at `content/essays/` and *will* be published).
- The partial adoption of `Patterns.hs` leaves the writing-stats page still affected by the same H-1.10.1 bug class the branch was meant to close.
- The misleading "fixed" comment in `Stats.hs` for L-1.3.4 will mislead any future reader auditing the same line.
**Is the branch worth merging?** **Yes, after the must-fix items above are addressed.** The fixes that landed correctly are substantial and high-value: the `lowerExt` CRITICAL is closed (which alone restores the entire WebP pipeline), every other HIGH except the partial H-1.10.1 is closed, the CSS a11y improvements are real and well-targeted, the build pipeline is more robust against deploy mistakes, and the BCI essay rewrite is high-quality. The Stats.hs blaze refactor and the Site.hs draft-mode feature are valuable improvements even though they exceed the audit's brief. The build still compiles cleanly.
The three concerns are all easily fixable in the next pass. None of them require reverting any of the work that's already been done.

View File

@ -110,6 +110,13 @@ rules = do
route idRoute
compile copyFileCompiler
-- Semantic search index — produced by tools/embed.py; fetched at runtime
-- by static/js/semantic-search.js from /data/semantic-index.bin and
-- /data/semantic-meta.json.
match ("data/semantic-index.bin" .||. "data/semantic-meta.json") $ do
route idRoute
compile copyFileCompiler
-- Similar links — produced by tools/embed.py; absent on first build or
-- when .venv is not set up. Compiled as a raw string for similarLinksField.
match "data/similar-links.json" $ compile getResourceBody

154
migrate_html.md Normal file
View File

@ -0,0 +1,154 @@
# Migration Plan: Refactoring `Stats.hs` HTML Generation
This document outlines a comprehensive migration plan for refactoring `build/Stats.hs` from manual string concatenation to a type-safe HTML combinator library, specifically `blaze-html`.
## Current Architecture and Issues
Currently, `build/Stats.hs` generates the HTML for the `/build/` and `/stats/` telemetry pages by manually concatenating raw strings (e.g., `"<div class=\"build-bar-row\">" ++ ...`).
This approach has several drawbacks:
1. **Security (XSS):** It is trivial to introduce Cross-Site Scripting (XSS) vulnerabilities if dynamic content (like post titles) is not manually escaped before being interpolated into the HTML string. The audit report specifically flagged the `link` function for this.
2. **Correctness:** It is easy to produce malformed HTML (e.g., missing closing tags, improperly nested elements, unescaped attributes) because the compiler cannot verify the structure of the string.
3. **Maintainability:** Complex HTML structures (like the 52-week activity heatmap) become difficult to read, modify, and debug when buried within string interpolation logic.
4. **Elegance:** It goes against the functional paradigm of building type-safe abstractions.
## Proposed Solution: `blaze-html`
`blaze-html` is a fast, mature, type-safe HTML combinator library for Haskell. It allows you to construct HTML documents using native Haskell functions and operators. By ensuring text and attribute values are escaped by default, it substantially reduces XSS risk. Furthermore, it improves structural correctness and reduces malformed markup by constructing HTML through typed combinators instead of ad hoc string concatenation.
**Scope:** This migration covers `build/Stats.hs` only. The separate `Site.hs` JSON-string-concat issue from the audit report is a distinct fix and is not addressed here.
For SVG generation (the heatmap), we will **not** add `blaze-svg` as a dependency. It is not currently in `cabal.project.freeze` and adding it would risk the dependency-resolution instability the audit already flagged. Instead, SVG elements will be emitted via blaze-html's custom-element facility (`Text.Blaze.Internal.customParent` / `customAttribute`), or via a small local helper module. This achieves type-safe SVG emission without a new dependency.
### 1. Dependency Updates
`blaze-html 0.9.2.0` is already pinned in `cabal.project.freeze` as a transitive dependency of Hakyll/Pandoc. The only required change is to declare it explicitly in `levineuwirth.cabal`.
* **Modify `levineuwirth.cabal`:** Add `blaze-html >= 0.9 && < 0.10` to the `build-depends` section of the `site` executable.
* **No freeze update required.** The package is already resolved; no `cabal freeze` run is needed.
### 2. Module Imports
In `build/Stats.hs`, import the core `blaze-html` modules:
```haskell
import qualified Text.Blaze.Html5 as H
import qualified Text.Blaze.Html5.Attributes as A
import Text.Blaze.Html.Renderer.String (renderHtml)
```
For SVG custom elements (heatmap), use blaze-html's internal custom-element facility:
```haskell
import qualified Text.Blaze.Internal as BI
```
Hakyll's `makeItem` takes a `String`, so `renderHtml :: Html -> String` is the correct renderer. Use it and stop there — the stats page is a few dozen KB at most and performance is not a concern.
### 3. Refactoring Strategy
The refactoring process should be approached incrementally, function by function. **Crucially, intermediate functions must return `H.Html`, with rendering to `String` occurring only at the absolute outer boundary.**
#### Phase 1: URL Sanitization and Core Helpers
While `blaze-html` escapes text and attributes, it **does not validate URLs**. An attacker could still inject `javascript:alert(1)` into an `href` attribute. We must introduce URL validation alongside our typed HTML helpers.
* **URL Validation:**
`isSafeUrl` is defense-in-depth: in current code every URL is produced by Hakyll's `getRoute` or constructed as a `/tag/` string, so there is no live XSS surface. Nevertheless, include it to prevent regressions.
The naive prefix check in string-land fails on `JavaScript:` (case), `\tjavascript:` (leading whitespace), and `data:text/html` attacks. Use a case-insensitive, stripped allowlist instead:
```haskell
import Data.Char (isSpace, toLower)
isSafeUrl :: String -> Bool
isSafeUrl u =
let norm = map toLower (dropWhile isSpace u)
in any (`isPrefixOf` norm) ["/", "https://", "mailto:", "#"]
safeHref :: String -> H.AttributeValue
safeHref u
| isSafeUrl u = H.stringValue u
| otherwise = H.stringValue "#"
```
Note: `http://` is intentionally excluded (mixed-content over HTTPS).
* **`link`:**
* *New:*
```haskell
link :: String -> String -> H.Html
link url title = H.a H.! A.href (safeHref url) $ H.toHtml title
```
* **`section`:**
* *New:*
```haskell
section :: String -> String -> H.Html -> H.Html
section id_ title body = do
H.h2 H.! A.id (H.stringValue id_) $ H.toHtml title
body
```
* **`table` and `dl`:**
These will utilize monadic `do` notation or `mapM_` over lists to generate rows and cells, returning `H.Html` natively.
* **Static TOC builders (`statsTOC`, `pageTOC`):** These also emit string-concat HTML and must be migrated here alongside the other primitives, not left for later.
#### Phase 2: Structural Components
Tackle the larger layout functions once the basic primitives are type-safe.
* **`renderContent`, `renderPages`, `renderDistribution`, `renderTagsSection`, `renderLinks`, `renderEpistemic`, `renderOutput`, `renderRepository`, `renderBuild`, `renderCorpus`, `renderNotable`, `renderMonthlyVolume`, `renderStatsTags`:**
All of these return `String` today and must be updated to return `H.Html`. They will compose the newly typed helper functions (`section`, `table`, `dl`).
*Example logic for a table row:*
```haskell
H.tr $ mapM_ (H.td . H.toHtml) cells
```
#### Phase 2.5: Lift the Heatmap's Inline `<style>`
The current heatmap (`renderHeatmap`) ships a `<style>` block embedded inside the SVG (`Stats.hs:207211`). Migrate those rules to `static/css/` where the rest of the heatmap CSS variables (`--hm-0` … `--hm-4`) live. This is the right moment to do it — don't carry the inline style into the typed version.
#### Phase 3: The Heatmap (`renderHeatmap`)
The heatmap generation involves nested SVG elements, CSS classes, and `<title>` tooltips.
* **Separation of Concerns:** Separate the data calculation from the rendering. Keep date, color, and layout calculations in pure data functions, and have the rendering functions handle strictly the HTML/SVG emission.
* **SVG via custom elements:** Use blaze-html's `Text.Blaze.Internal.customParent` and `customAttribute` to construct SVG elements type-safely, replacing `"<rect class=\"" ++ ...` with typed combinators — no `blaze-svg` dependency required. Alternatively, define a minimal local `Svg` helper module (1015 lines) that wraps the most-used SVG tags (`svg`, `rect`, `text_`, `figure`) before this phase begins.
#### Phase 4: Integration with Hakyll
Finally, update the top-level Hakyll rules that consume these generated structures. This is the only place `renderHtml` should be called.
* **`statsRules`:**
* The `content` variable will now represent a single, large `H.Html` monad.
* Call `renderHtml` exactly once to produce a `String`, then pass it to `makeItem`. The `stripHtmlTags`-based word-count pipeline operates on that rendered string and is unaffected.
* The static TOC strings (`pageTOC`, `statsTOC`) are also rendered via `renderHtml` before being passed to `constField`.
* *Example:*
```haskell
let htmlContent = do
renderContent rows
renderPages allPIs oldestDate newestDate
-- ...
contentString = renderHtml htmlContent
plainText = stripHtmlTags contentString
```
#### Phase 5: Testing and Auditing
* **Auditing:** During migration, thoroughly search for and eliminate any remaining raw HTML helpers, pre-escaped content, or `unsafe` rendering patterns.
* **Testing:** Add specific tests for escaping behavior to ensure security goals are met:
* Title containing `<script>alert(1)</script>` renders escaped.
* Attributes with quotes are escaped correctly.
* Dangerous URLs (e.g., `javascript:...`) are rejected or rewritten by `isSafeUrl`/`safeHref`.
* Golden/snapshot tests to ensure generated HTML still contains the expected structure.
### Summary of Benefits
Completing this migration will:
* **Substantially reduce XSS risk:** Text and attribute values will be escaped by default, and dangerous URLs will be validated and neutralized.
* **Improve structural correctness:** Using typed combinators prevents malformed markup and enforces balanced tags.
* **Improve composability:** Returning `H.Html` from all helper functions avoids "half-rendered" strings and double-escaping issues.
* **Improve readability and testability:** Complex UI components like SVG heatmaps will be declarative, and pure data processing will be decoupled from rendering.

Binary file not shown.

Binary file not shown.