diff --git a/audit.md b/audit.md deleted file mode 100644 index 55e098f..0000000 --- a/audit.md +++ /dev/null @@ -1,739 +0,0 @@ -# levineuwirth.org — Comprehensive Audit - -**Auditor:** Independent code review (read-only, no changes made) -**Date:** 2026-04-09 -**Scope:** ~15,400 lines across Haskell build system (`build/**/*.hs`), Pandoc filters (`build/Filters/*.hs`), static JavaScript (`static/js/*.js`), CSS (`static/css/*.css`), templates (`templates/**`), Python tooling (`tools/*.py`), shell scripts (`tools/*.sh`), `Makefile`, cabal/pyproject configuration, and repository hygiene. -**Methodology:** Direct reading of critical modules (`Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Filters/*.hs`, `Makefile`, shell scripts, `embed.py`); parallel exploration of JS, CSS, templates, and the larger Python tools. - -Each finding is labeled by **severity** (`CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `NIT`) and cites file + line. The codebase is generally well-written — architecture is clean, modules are tightly scoped, YAML/frontmatter is parsed defensively, and escaping is applied in most HTML rendering sites. Most findings are local issues; the codebase does not exhibit systemic rot. - ---- - -## Executive summary - -**Confirmed correctness bugs (by impact):** - -| # | File | Severity | Summary | -|---|------|----------|---------| -| 1 | `build/Filters/Images.hs:110` | **CRITICAL** | `lowerExt` is mathematically wrong — returns `"image."` for `"image.jpg"`. Every local raster fails `isLocalRaster`, so **no `` / WebP wrapping happens site-wide**. The entire WebP pipeline is dead code. | -| 2 | `build/Commonplace.hs:126-131` | **HIGH** | Operator-precedence bug in `renderChronoView`: `a ++ if c then x else y ++ z` parses as `a ++ (if c then x else (y ++ z))`, so `` is never emitted when the commonplace book is empty → unclosed tag. | -| 3 | `tools/embed.py:68-73` | **HIGH** | Root `index.html` yields URL `"/./"` instead of `"/"`. Homepage is never matched by `SimilarLinks.hs`, so the "Related" block never renders on the home page. | -| 4 | `build/Authors.hs:50` | **HIGH** | `allContent` pattern does not include `content/essays/*/index.md` (directory-form essays). Author pages silently omit those essays. Compare against `Tags.hs:69`, which *does* include them. | -| 5 | `build/Filters/Score.hs:40` | **HIGH** | `TIO.readFile fullPath` is called with no existence check and no exception catch. A missing SVG aborts the entire build with a bare `openFile: does not exist` — no file name context, no graceful fallback. | -| 6 | `build/Filters/Viz.hs:96-99` | **HIGH** | Same pattern: `readProcessWithExitCode "python3" [fullPath]` runs even when `fullPath` doesn't exist; the only signal the author gets is a generic "non-zero exit". | -| 7 | `build/Filters/Sidenotes.hs:38` | **HIGH** | Sidenote labels wrap after the 26th note: `(n - 1) mod 26` turns note 27 into `a` again, creating duplicate `id="sn-a"` / `id="snref-a"` across the same document. Breaks in-page links and screen-readers. | -| 8 | `build/Filters/Images.hs:77` | **MEDIUM** | `passedKvs` filters only `loading` and `data-lightbox`, but not `id`, `class`, `alt`, or `title` — all of which are already emitted explicitly above. Any author-set `id=` or `class=` kv on an image is emitted **twice** in the ``, producing invalid HTML (``). | -| 9 | `build/Contexts.hs:263-264` | **MEDIUM** | `confidenceTrendField` uses `xs !! (length xs - 2)` (O(n) indexing) and `last xs`. They are guarded by a length check so they're safe, but this is a partial idiom in a module that otherwise uses total patterns. | -| 10 | `build/Filters/Links.hs:59` | **MEDIUM** | `not ("levineuwirth.org" 'T.isInfixOf' url)` — substring match. `https://evil-levineuwirth.org.attacker.com` is classified as *internal*, skipping `rel=noopener noreferrer target=_blank`. | - -**Defense-in-depth findings:** - -- `build/Filters/Transclusion.hs:41` interpolates the author-controlled `sec` section name into a `data-section="..."` attribute with no escaping. In a static site where all Markdown is author-authored this is not an exploitable XSS, but it is a raw-HTML injection primitive — a stray `"` in a section name will break markup, and any future lowering of the "author is trusted" assumption (PRs, multi-author site, user submissions) turns it into one. -- `build/Stats.hs:161-169` implements a correct URL allowlist (`isSafeUrl`) but accepts `"/"` as a prefix, which also matches `//evil.com` (protocol-relative URLs). Mostly cosmetic here since inputs come from Hakyll-computed routes, but the allowlist comment claims strict defense and this is a hole. -- Two different `authorSlugify` / `nameOf` implementations exist (`Authors.hs:30-39` and `Contexts.hs:147-154`). They'll drift the moment one is edited. -- Five copies of `escHtml` — `Utils.hs:18-26` (the "real" one), `Filters/Images.hs:135-142`, `Filters/Score.hs:88-92`, `Filters/Smallcaps.hs` (per the filter audit), `Filters/Viz.hs:178-182`, plus identical ones in JS (`annotations.js`, `popups.js`, `semantic-search.js`). Any fix must be made in 7+ places. - -**Repository hygiene:** - -- `.env` is gitignored and not tracked — good. -- `~5.4 MB` of `.docx` binaries (`BeyondComorbidityIndices*.docx`) sit in the repo root, untracked but present; they're build input for the new essay but should be moved under `paper/` or similar rather than the project root. -- `HOMEPAGE.md~` (zero-byte editor backup) is on disk; gitignore catches it, but it should be removed. -- `content/modern_idolatry.md` is untracked and not under `content/drafts/` — either it's a ready-to-publish draft that escaped the drafts workflow, or a forgotten scratch file. -- `build/Metadata.hs` contains only `module Metadata where` — a no-op placeholder dragged along since Phase 2. Delete or populate. -- `build/Filters/Math.hs` and `build/Filters/Dropcaps.hs` are `apply = id` placeholders; fine as TODO anchors, but `-Wno-unused-imports` in `levineuwirth.cabal` is masking warnings that would otherwise tell you so. - ---- - -## 1. Haskell build system (`build/*.hs`) - -### 1.1 `Site.hs` - -**L-1.1.1 — LOW — Blog posts do not support directory-form pages.** `content/blog/*.md` (line 249) only matches flat posts; compare to essays and poetry, which accept both flat and `*/index.md`. If the author ever wants to co-locate blog assets, they'll have to edit both the rule and `Backlinks.hs:allContent`. - -**L-1.1.2 — LOW — Backlinks pattern drift.** `allContent` in `Backlinks.hs:200-208`, `Authors.hs:50`, `Tags.hs:69`, and the implicit patterns in `Site.hs` all enumerate the same content types, slightly differently. Authors omits directory essays; Backlinks omits fiction/*/index.md; Tags includes both essay forms but not fiction. This divergence is the root of finding #4 (Authors missing directory essays) and will continue to produce silent bugs. Extract one canonical `Patterns.hs`. - -**L-1.1.3 — LOW — `draftEssays` → `isDev` ties build correctness to an environment variable read at rule registration.** `isDev <- preprocess $ ... lookupEnv "SITE_ENV"` runs once at startup. Correct — but a developer toggling `SITE_ENV` mid-`cabal run site -- watch` will be confused. Worth a comment at the `preprocess` call, not just near `draftEssays`. - -**L-1.1.4 — LOW — `library.html` loads all content four times.** `portalList` calls `loadAll essays`, `loadAll posts`, `loadAll fiction`, `loadAll poetry` **inside the inner list body**, which is re-evaluated for each of the eight `portalList` calls. That's 32 `loadAll` calls for eight portals. Hakyll caches identifiers so the impact is bounded, but it's still unnecessary work; hoist the loads into the outer `compile` block. - -**NIT — `random-pages.json` (line 445).** The type annotation `:: Compiler [Item String]` on every binding is load-bearing because without it Hakyll can't infer the snapshot type. Fine, but a quick comment would save a future reader from thinking they're decorative. - -### 1.2 `Contexts.hs` - -**M-1.2.1 — MEDIUM — `authorLinksField` produces empty-slug URLs for empty author names.** `authorLinksField` (line 161) splits on `|`, trims, and calls `authorSlugify`. An entry like `"| https://url"` or `" "` produces name `""` → slug `""` → URL `/authors//`. Guard against empty names (fall back to `defaultAuthor` or skip the entry). - -**M-1.2.2 — MEDIUM — `parseMovements` silently drops malformed entries.** `parseMovements` (line 380-397) uses `catMaybes $ map parseOne` — an entry missing `name` or `page` is dropped with zero diagnostic. Compositions with a typo in one movement silently lose it. Add at least a `putStrLn` warning via `unsafeCompiler` or fail loudly. - -**L-1.2.3 — LOW — `abstractField` only strips single-`Para` abstracts.** Line 184-186: `Pandoc m [Para ils] -> Pandoc m [Plain ils]`. An abstract with inline `
` or line breaks becomes multiple `Para` blocks and the outer `

` is not stripped. Harmless but inconsistent. - -**L-1.2.4 — LOW — `confidenceTrendField` threshold of ±5 is undocumented.** Line 267-269: `c - p > 5` → up, `p - c > 5` → down. The comment in the header describes behavior but not the threshold. Magic number. - -**L-1.2.5 — LOW — `pageScriptsField` uses the script path as the item identifier.** Line 123: `Item (fromFilePath s) s`. If two separate frontmatter entries both load `shared.js`, they collide in Hakyll's item-store the first time `listField` evaluates them. Probably works by accident because the inner `script-src` field just returns `itemBody`; note the risk. - -**NIT — `getInt` via `Rational → Double → floor`** (line 396). If a page number is `1000000000000000000` (unlikely), Double precision loss. Use `Scientific.floatingOrInteger` from `scientific` (already transitively available via Aeson). - -### 1.3 `Stats.hs` - -**M-1.3.1 — MEDIUM — `stripHtmlTags` is naive.** Line 108-111 strips `<...>` greedily, ignoring `>` inside attribute values, `` comments, and ``. Used to compute word count and reading time for the `/build/` page so the impact is limited, but if a future author writes `alt="a > b"` (rare but legal) it'll slice the content. - -**M-1.3.2 — MEDIUM — `walkDir` has no symlink-loop protection.** Line 406-416 recurses through `_site` via `doesDirectoryExist`, which follows symlinks. A developer who accidentally symlinks `_site/a → _site` will infinite-loop the build. Use `doesDirectoryExist` + `pathIsSymbolicLink` (in `directory >= 1.3.6`). - -**L-1.3.3 — LOW — `isSafeUrl` allows protocol-relative URLs.** Line 161-164 accepts `"/"`-prefixed values. `"//evil.com"` matches this prefix. All current inputs are Hakyll-derived routes so the exposure is nil, but the comment ("Defense-in-depth URL allowlist") claims more rigor than the implementation provides. Fix: reject `u` that begins with `//`. - -**L-1.3.4 — LOW — `readFile`/`Aeson.decodeStrict` round-trip.** Line 741 decodes backlinks via `TE.encodeUtf8 (T.pack rawBL)` where `rawBL :: String`. That is `String → Text → ByteString` — three copies. Read the item as `Item ByteString` via `getResourceLBS` (or keep backlinks.json as bytes throughout) to avoid two conversions. - -**L-1.3.5 — LOW — Two separate tag sections.** `renderStatsTags` (line 380) and `renderTagsSection` (line 568) are the same function with different names. Consolidate. - -**L-1.3.6 — LOW — Lazy `readFile` in `countLinesDir`.** Line 455: `readFile (dir e)` holds the handle open until `length (lines content)` is fully forced. Under `forM`, multiple handles may be concurrently open. For a 30-file build directory it's fine; use `Data.Text.IO.readFile` for explicit strictness. - -**NIT — `lookupString "title" meta` fallback `"(untitled)"`** (line 71 and many siblings). Fine, but consider extracting a `titleOr` helper since it appears ~6 times. - -### 1.4 `Backlinks.hs` - -**L-1.4.1 — LOW — `normaliseUrl` does not URL-decode.** Line 188-194: stripping `?` and `#` is done on the raw URL without percent-decoding. A path like `/essays/caf%C3%A9` won't normalize to `/essays/café`. Current build likely does not emit percent-encoded routes, so this is latent. - -**L-1.4.2 — LOW — `backlinksField` does not handle the "item with noResult route" case explicitly.** When `getRoute item` is `Nothing`, it fails with `"backlinks: item has no route"`. Fine, but that path is unreachable for items that have an associated rule. Note it, remove if always reachable. - -**NIT — `renderBacklinks` concatenates strings; use blaze-html** to match `Stats.hs`. Not urgent; the output is static per build. - -### 1.5 `Citations.hs` - -**L-1.5.1 — LOW — Partial functions in `transformInline`.** Line 142: `head keys` / `head nums`. Guarded by `null nums` check above and by the structure of Pandoc `Cite` (never empty from the parser), so this is safe in practice. Swap to `case nums of (n:_) -> ...`. - -**L-1.5.2 — LOW — `markerHtml` concatenates `T.unpack . show` via `tshow`** but also builds `data-cite-keys` as a space-separated list of HTML IDs with no escaping. If a citation key contains a quote character (unusual but legal), the attribute breaks. - -**NIT — `stripRefPrefix` (line 209)** is `"ref-"`-specific; should be renamed `stripPandocRefPrefix` or documented with a pointer to the Pandoc source that emits it. - -### 1.6 `Compilers.hs` - -**L-1.6.1 — LOW — `pageCompiler` does not save a `toc` snapshot.** OK for pages that use `pageCtx`, but the commonplace, landing, and standalone pages that would benefit from a TOC get no opportunity. Not a bug — an architectural choice worth documenting. - -**NIT — `stringify`** is redefined here (line 56-77) in addition to `Filters/Images.hs:119-132` and the one `Text.Pandoc.Shared` exports. Three implementations. Pick one. - -### 1.7 `Stability.hs` - -**M-1.7.1 — MEDIUM — `readIgnore` uses lazy `readFile`.** Line 44: handle stays open until the whole list is forced. Fine for a single-shot read but the pattern is fragile; `Data.Text.IO.readFile` is strict. - -**L-1.7.2 — LOW — `unsafeCompiler` for git subprocess breaks Hakyll's dep tracking.** `stabilityField` calls `git log` via `unsafeCompiler`. Hakyll will not re-run the compiler when HEAD moves. Expected — `make build` always runs `git add content/` + commit first, which updates mtimes — but it's fragile to reason about. Worth a note at the `unsafeCompiler` call site rather than the header docs. - -**L-1.7.3 — LOW — `gitDates` ignores `stderr`.** Line 54: `(ec, out, _) <- readProcessWithExitCode ...` — `_` drops the error. If the file isn't tracked yet, git prints a warning to stderr; user sees nothing. Log it. - -**NIT — `stabilityFromDates` classification is undocumented magic.** `n <= 5 && age < 90` → "revising". These thresholds should be constants with intent comments. - -### 1.8 `Catalog.hs` - -**M-1.8.1 — MEDIUM — `renderEntry` does not escape frontmatter.** `ceTitle`, `ceYear`, `ceDuration`, `ceInstrumentation`, and `ceUrl` are pasted directly into HTML via `concat`. This is consistent with the site's "author-controlled trusted HTML in titles" convention (`Stats.hs:180-186` calls this out explicitly), but `Catalog.hs` has *no such comment*. If a collaborator's frontmatter contains a stray `<` or a malformed entry, the HTML breaks silently. - -Suggest: adopt the `pageLink` convention from `Stats.hs` — escape `href` via `safeHref`, pass title through `preEscapedToHtml` with a documented comment. - -**L-1.8.2 — LOW — `renderCategorySection` assumes non-empty group.** Line 194: `categoryLabel (ceCategory (head g))`. `groupBy` on a non-empty list produces non-empty sublists, so this is safe, but partial. - -**NIT — `categoryRank` uses `lookup` instead of `elemIndex`.** Shorter: -```haskell -categoryRank c = fromMaybe (length categoryOrder) (elemIndex c categoryOrder) -``` - -### 1.9 `Commonplace.hs` - -**H-1.9.1 — HIGH — Operator-precedence bug in `renderChronoView` (line 126-131).** -```haskell -renderChronoView entries = - "

" -``` - -Parses as `"..." ++ (if null sorted then "..." else (concatMap renderEntry sorted ++ ""))`. When `sorted` is empty, the closing `` is silently dropped. Fix: parenthesize the `if`, or split into two lines with explicit binding. - -**L-1.9.2 — LOW — `renderText` replaces `\n` with `
\n`** after escaping, which is correct, but does not escape `\r`. Windows-style line endings would produce `\r
`, leaving stray `\r` in HTML. Normalize line endings in `stripTrailingNL`. - -### 1.10 `Authors.hs` - -**H-1.10.1 — HIGH — `allContent` omits directory-form essays.** Line 50: -```haskell -allContent = ("content/essays/*.md" .||. "content/blog/*.md") .&&. hasNoVersion -``` - -Compare to `Tags.hs:69`, which adds `"content/essays/*/index.md"`. Any essay stored as `content/essays/foo/index.md` will NOT appear on its author's index page. This is the most likely source of silent "why isn't this essay on my author page" bugs. - -**L-1.10.2 — LOW — Duplicate of `Contexts.authorSlugify`.** `Authors.slugify` and `Contexts.authorSlugify` do the same thing with different definitions (the Contexts version normalizes before filtering, Authors version filters after lowercasing). The two will diverge on Unicode edge cases. Consolidate. - -### 1.11 `Utils.hs` - -**L-1.11.1 — LOW — `wordCount` counts HTML tokens as words.** Called from `Compilers.hs:172` on raw source `src` (Markdown, including any raw HTML) and from `Stats.hs:809` on tag-stripped HTML. On raw Markdown this miscounts `[display](url)` as three "words". Low-severity because the stat is approximate anyway, but worth noting when comparing `/stats/` numbers to `wc`. - -### 1.12 `Pagination.hs`, `Tags.hs`, `SimilarLinks.hs`, `Metadata.hs`, `Main.hs` - -No material issues. `Metadata.hs` is a two-line empty-module placeholder — delete or populate. - ---- - -## 2. Pandoc filters (`build/Filters/*.hs`) - -### 2.1 `Filters/Images.hs` — the big one - -**C-2.1.1 — CRITICAL — `lowerExt` returns the basename, not the extension.** Line 110: -```haskell -lowerExt = map toLower . reverse . ('.' :) . takeWhile (/= '.') . tail . dropWhile (/= '.') . reverse -``` - -Trace for `"image.jpg"`: -1. `reverse` → `"gpj.egami"` -2. `dropWhile (/= '.')` → `".egami"` -3. `tail` → `"egami"` -4. `takeWhile (/= '.')` → `"egami"` -5. `('.' :)` → `".egami"` -6. `reverse` → `"image."` -7. `toLower` → `"image."` - -So `lowerExt "image.jpg" == "image."` — which does not equal `.jpg`, `.jpeg`, `.png`, or `.gif`. **`isLocalRaster` is therefore `False` for every file**, the entire ``/WebP dispatch is dead code, and `tools/convert-images.sh` produces `.webp` companions that are never referenced. - -Fix: `System.FilePath.takeExtension` is already imported elsewhere and already pulled in transitively; replace with -```haskell -lowerExt = map toLower . takeExtension -``` - -**M-2.1.2 — MEDIUM — `passedKvs` duplicate-emits `id`, `class`, `alt`, `title`.** Line 77: -```haskell -passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox"]) kvs -``` - -But above, `attrId`, `attrClasses`, `attrAlt`, and `attrTitle` already emit those attributes from `(ident, classes, kvs)`. If an author writes `![alt](img.jpg){.foo title="bar"}`, Pandoc places `title` into `kvs`, so the output becomes ``. Expand the blacklist: -```haskell -passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox", "id", "class", "alt", "title"]) kvs -``` - -Side-note: the same issue affects the non-picture branch at line 47 indirectly (via the `Image` constructor Pandoc emits), but Pandoc's HTML writer handles dedup there. - -**M-2.1.3 — MEDIUM — `stringify` catches most but not all inline variants.** Line 119-132: handles `Str`, `Space`, `SoftBreak`, `LineBreak`, `Emph`, `Strong`, `Code`, `Link`, `Image`, `Span`. Misses `Strikeout`, `Superscript`, `Subscript`, `SmallCaps`, `Quoted`, `Cite`, `Math`, `RawInline`. Alt text for an image captioned `~subscript~` will be empty. - -**L-2.1.4 — LOW — `renderKvs` does not escape the key.** Line 94: `" " <> k <> "=\"" <> esc v <> "\""`. Keys in Pandoc come from Markdown attribute syntax and can only be identifiers, so this is safe in practice; but it's asymmetric with `v` and deserves either `esc k` or an assertion comment. - -**L-2.1.5 — LOW — `isUrl` misses `data:`, fine; misses `file://`, OK; misses `mailto:` not relevant here.** Accurate for the intended domain. - -### 2.2 `Filters/Transclusion.hs` - -**M-2.2.1 — MEDIUM — `sec` attribute not HTML-escaped.** Line 41: -```haskell -Just (slugToUrl slug, " data-section=\"" ++ sec ++ "\"") -``` - -`sec` is everything after `#` up to `}}` in the Markdown source. If an author writes `{{essay#a"b}}`, the emitted HTML is `
` — invalid markup. Not a realistic XSS vector on a single-author static site (would be a self-attack), but: -- It's an injection primitive. The moment content ever comes from a PR, a collaborator, or an imported source, it becomes one. -- The fix is one line: escape `"`, `<`, `>`, `&` before interpolation. - -**L-2.2.2 — LOW — `slugToUrl` appends `.html` unconditionally.** Line 46-49: `slug ++ ".html"`. If the slug is already `page.html`, you get `page.html.html`. Unlikely in practice (source convention is `{{essay-slug}}` with no extension), but guard against it. - -**NIT — `trim` re-implemented yet again.** Same function appears at least four times (`Transclusion.hs:59`, `EmbedPdf.hs:80`, `Wikilinks.hs:59`, plus `Contexts.hs`'s `strip`). Factor. - -### 2.3 `Filters/Score.hs` - -**H-2.3.1 — HIGH — `TIO.readFile fullPath` with no existence check and no exception handling.** Line 40. A Markdown file that references a missing SVG aborts the entire Hakyll build with nothing more than: -``` -openFile: does not exist (No such file or directory) -``` - -No filename, no page context, no recovery. Fix: -```haskell -existed <- doesFileExist fullPath -if not existed - then do putStrLn $ "[Score] missing: " ++ fullPath - return (Div ("", cls, attrs) blocks) - else do svgRaw <- TIO.readFile fullPath - ... -``` - -Or wrap in `try` and fall back to an `errorBlock` mirroring `Filters.Viz.errorBlock`. - -**M-2.3.2 — MEDIUM — Lazy-I/O `readFile` under `walkM`.** Using `Data.Text.IO.readFile` forces immediately, so this is actually OK — I retract the generic concern. The real issue is #H-2.3.1 above. - -**L-2.3.3 — LOW — `processColors` is order-sensitive.** The comment on line 56-58 acknowledges it: the 6-digit hex replacements come *last* in the function composition chain, which means they're applied *first*. That's correct and the comment is helpful. Keep the comment. - -**L-2.3.4 — LOW — `escHtml` reorder bug.** Line 88-92: -```haskell -escHtml = T.replace "\"" """ - . T.replace ">" ">" - . T.replace "<" "<" - . T.replace "&" "&" -``` - -`&` must be replaced *first*, else the `&` injected by other replacements gets its `&` replaced by `&` to become `&amp;`. Read bottom-up because of function composition: `&` → `<` → `>` → `"`. Wait — function composition: `f . g . h` applied to `x` is `f (g (h x))`. So the order executed is `&`, then `<`, then `>`, then `"`. **This is correct** (`&` first). Retracted — the `Viz.escHtml` at `Viz.hs:178-182` has the same composition order and is also correct. Nit only: write the function as a single chain with a comment stating the invariant. - -### 2.4 `Filters/Viz.hs` - -**H-2.4.1 — HIGH — No file-existence check before `readProcessWithExitCode`.** Line 96-99. Same class of bug as Score; the user sees `"non-zero exit"` with no path. Add `doesFileExist fullPath` before spawning. - -**M-2.4.2 — MEDIUM — Exception handler drops the exception detail.** Line 99: -```haskell -`catch` (\e -> return (ExitFailure 1, "", show (e :: IOException))) -``` - -The third tuple element is set to `show e`, but then on line 102 the caller reads it as `err` and displays it. That's actually correct — retracted. BUT the error bubbles up to `errorBlock` which renders `
...
` inline in the page. That's actually graceful. Good. - -**L-2.4.3 — LOW — `escScriptTag` only replaces `` inside strings. Vega-Lite specs won't contain those, so fine. - -**L-2.4.4 — LOW — `warn` uses `putStrLn` to stdout, not stderr.** Line 176. Mixes with Hakyll's build progress output. Use `hPutStrLn stderr`. - -### 2.5 `Filters/Sidenotes.hs` - -**H-2.5.1 — HIGH — Label wrap at 26 produces duplicate IDs.** Line 38: -```haskell -toLabel n = T.singleton (toEnum (fromEnum 'a' + (n - 1) `mod` 26)) -``` - -Note 27 → `a` again. Two `` and two `` in the same document. Duplicate IDs are invalid HTML, break `href="#sn-a"` fragment navigation, and confuse ATs. - -Fix options: -1. Use numeric labels: `"sn" ++ show n`. -2. Use two-letter labels for n > 26: `aa`, `ab`, …, `zz`. -3. Fail loudly with `error`: essays with >26 footnotes are rare and the user should know. - -**M-2.5.2 — MEDIUM — `replacePTags` is a string-level hack.** Line 57-60: -```haskell -replacePTags = - T.replace "

" "" - . T.replace "

" "" -``` - -A footnote whose content contains the literal text `

` (e.g., a code sample discussing HTML) will be mangled. Rare but possible. The correct fix is to transform the AST before writing, not the post-rendered HTML. - -### 2.6 `Filters/Links.hs` - -**M-2.6.1 — MEDIUM — `isExternal` uses substring match for the site domain.** Line 59: -```haskell -isExternal url = - ("http://" `T.isPrefixOf` url || "https://" `T.isPrefixOf` url) - && not ("levineuwirth.org" `T.isInfixOf` url) -``` - -`https://evil-levineuwirth.org.attacker.com/phish` contains `levineuwirth.org` as a substring → classified as *internal* → no `rel=noopener noreferrer target=_blank`. In 2026 with partitioned cookies this is mostly a cosmetic concern, but fix is trivial: -```haskell -isSameHost url = - case T.stripPrefix "https://" url <|> T.stripPrefix "http://" url of - Nothing -> False - Just rest -> - let host = T.takeWhile (\c -> c /= '/' && c /= ':') rest - in host == "levineuwirth.org" || "." `T.isSuffixOf` ("." <> host) -- etc. -``` - -or simpler: `host == "levineuwirth.org" || T.isSuffixOf ".levineuwirth.org" host`. - -**M-2.6.2 — MEDIUM — PDF links with fragment are not rewritten.** Line 30-36 requires `.pdf" T.isSuffixOf` url` — a URL like `/papers/foo.pdf#page=5` has suffix `5`, not `.pdf`, so it doesn't route through the PDF.js viewer. Compare to `EmbedPdf.hs` which does handle fragments in the source preprocessor path. Inconsistent. - -**L-2.6.3 — LOW — `domainIcon` duplicates twitter/x and youtube/youtu.be mappings.** Fine. Nit: table-driven via `lookup` would be cleaner than the chain of guards. - -### 2.7 `Filters/Wikilinks.hs` - -**M-2.7.1 — MEDIUM — `toMarkdownLink` does not escape `]` or `)`.** Line 33-36: -```haskell -toMarkdownLink inner = - let (title, display) = splitOnPipe inner - url = "/" ++ slugify title - in "[" ++ display ++ "](" ++ url ++ ")" -``` - -If the display text contains `]` or `)`, the generated Markdown is broken and Pandoc will parse it as raw text or as a weird link. Rare in practice (wikilink display is usually a plain name), but worth escaping. - -**L-2.7.2 — LOW — `slugify` uses `intercalate "-" . words . ...` — "a.b" → "a b" → "a-b".** That's by design (punctuation becomes space becomes hyphen). Note the trailing hyphen for inputs like "end.": space after "end" → `["end"]` → "end". OK. - -**NIT — Inefficient `trim` — `reverse . dropWhile ' ' . reverse . dropWhile ' '`.** Use `T.strip` if inputs were Text. `String`-based pipeline makes this unavoidable. - -### 2.8 `Filters/EmbedPdf.hs` - -**M-2.8.1 — MEDIUM — `encodeQueryValue` does not encode `#`.** Line 68-76: the encoder is called on `filePath`, which is already split on `#` by `parseDirective` (line 38). So the unencoded `#` issue doesn't bite here. However, the docstring at line 65 says "percent-encode characters that would break a query-string value" — `#` is such a character. Add it for defense in depth, even if the current call site doesn't benefit. - -**L-2.8.2 — LOW — `parsePageHash` silently produces `""` for invalid fragments.** Line 45-51. An author writing `{{pdf:/foo.pdf#garbage}}` silently drops the fragment. No warning. - -### 2.9 `Filters/Typography.hs`, `Filters/Code.hs`, `Filters/Smallcaps.hs`, `Filters/Dropcaps.hs`, `Filters/Math.hs` - -Scanned via the parallel sub-audit; only nit-level findings apply (duplicate `escHtml`, smart-quote edge case in abbreviation matching, `apply = id` placeholders). - ---- - -## 3. Static JavaScript (`static/js/*.js`) - -Audited by parallel exploration. The full per-file list is long; the aggregate pattern is: **no user-authored content is ever injected**, so `innerHTML` usage across `popups.js`, `annotations.js`, `citations.js`, and `selection-popup.js` is **not an XSS vector under the current authoring model**. The risk profile changes the moment the site accepts PRs, gains an annotations-backend, or proxies third-party content (none of which are planned per `spec.md`). - -### 3.1 XSS surface (all author-trust scoped) - -**M-3.1.1 — MEDIUM — `popups.js:608-614` copies `innerHTML` from the page into the popup.** The `epistemicContent` provider does `html += '

' + compact.innerHTML + '
'`. Because the source (`.ep-compact`) is emitted by our own Haskell code (`Contexts.hs` + templates), this is safe under the trust model. Switch to `compact.cloneNode(true)` + `popup.appendChild()` for a defense-in-depth fix that costs nothing. - -**M-3.1.2 — MEDIUM — `popups.js` cross-origin fetches (Wikipedia, arXiv, CrossRef, GitHub, etc.) don't validate `Content-Type`.** A malicious CORS-enabled endpoint could return HTML that the popup would render. Every fetch already pipes through an `esc()` call (line 655-661), so the risk is bounded to text that escapes in some corner. - -**L-3.1.3 — LOW — `citations.js:15, 56` and `annotations.js:167-172` use `innerHTML` with escaped data.** The escaping is correct; the fragility is that the escape-before-concat pattern is easy to get wrong in the future. - -### 3.2 Event handling / lifecycles - -**M-3.2.1 — MEDIUM — `sidenotes.js:73-94` attaches listeners per-sidenote with no cleanup path.** When `transclude.js` re-renders a fragment on resize, sidenotes accumulate duplicate handlers. Net effect: `update()` gets called 2×, 3×, … on hover over the same sidenote. Not a bug in the output, but a measurable leak over a long session. - -**M-3.2.2 — MEDIUM — `popups.js` attaches listeners at load time and never re-binds for transcluded content.** A transcluded essay's internal links have no popup previews. If transclusion is meant to feel "live", this is a user-visible gap. - -**M-3.2.3 — MEDIUM — `semantic-search.js:66-74` race in `loadModel`.** If two searches fire before the first model-load resolves, both call `import()` and `pipeline()`. Second call wastes CPU + memory. Track in-flight Promise: -```js -if (loadPromise) return loadPromise; -loadPromise = import(CDN).then(...); -``` - -### 3.3 Accessibility - -**H-3.3.1 — HIGH — `gallery.js` overlay has no focus trap.** `openOverlay()` focuses the close button, but Tab escapes into the backdrop. Pattern to copy: `settings.js:35-49`. - -**M-3.3.2 — MEDIUM — `selection-popup.js` annotation picker color swatches are mouse-only.** Arrow-key navigation + Enter to select would make it keyboard-accessible. - -**M-3.3.3 — MEDIUM — `sidenotes.js` sidenote focus toggle is click-only.** No keyboard equivalent. - -**L-3.3.4 — LOW — `lightbox.js:18,42` defaults `img.alt` to `""` and only later populates from source.** If source alt is missing, the lightbox image has no accessible name. Use `img.alt = srcAlt || 'Lightbox image'`. - -**L-3.3.5 — LOW — `theme.js:9-28` does not `try/catch` around `localStorage.getItem`.** Private-browsing Safari throws. The code happens to work because `getItem` returns `null` on failure *in most browsers*, but not all. - -### 3.4 Duplication and style - -**L-3.4.1 — LOW — HTML escaping reimplemented 3× across `annotations.js`, `popups.js`, `semantic-search.js`.** Add a shared `utils.js` (one function). - -**L-3.4.2 — LOW — Mixed `var` vs `const`/`let`.** `citations.js`, `nav.js`, `sidenotes.js`, `toc.js` use modern ES6+; `popups.js`, `annotations.js`, `gallery.js` use `var`. Pick one. - -**NIT — Magic-number sprinkles** for delays (`SHOW_DELAY=250`, `HIDE_DELAY=150`, `SHOW_DELAY=450`, swipe threshold `30`, etc.). Not worth a refactor. - ---- - -## 4. CSS and HTML templates - -Audited by parallel exploration. Highlights: - -### 4.1 CSS - -**H-4.1.1 — HIGH — Undefined CSS custom properties.** `build.css` uses `--rule` (lines 21, 30, 39, 69) and `--bg-subtle` (components.css:1448) and `--font-ui` (many places) that have no definition in `base.css`. Browsers treat `var(--undefined)` as the initial value → silent visual degradation on the `/build/` and annotation-related pages. - -Fix: -```css -:root { - --rule: var(--border-muted); - --font-ui: var(--font-sans); - --bg-subtle: #f5f5f5; -} -[data-theme="dark"] { --bg-subtle: #1f1f1f; } -``` - -**H-4.1.2 — HIGH — Dark-mode `--text-faint` contrast fails WCAG AA.** `#6a6660` on `#121212` ≈ 2.8:1. Used for sidenote numbers (0.65em!) and disabled-state icons. Bump to ~`#8b8680` (≈3.5:1) at minimum. - -**H-4.1.3 — HIGH — TOC collapse hides content from keyboard + AT.** `components.css:433-436` uses `visibility: hidden` on collapsed TOC, which removes it from the accessibility tree. Use `aria-expanded` + height transition, or `aria-hidden="true"` explicitly, or `display: none` (losing the smooth collapse). - -**H-4.1.4 — HIGH — No consistent `:focus-visible` ring across interactive elements.** `.nav-portal-toggle`, `.settings-toggle`, `.toc-toggle`, `.annotation-toggle` lack focus styles. Add a global: -```css -button:focus-visible, a:focus-visible { - outline: 2px solid var(--text); - outline-offset: 2px; -} -``` - -**M-4.1.5 — MEDIUM — Hardcoded hex in `print.css`.** `#fff`, `#000`, `#f9f9f9`, `#ddd` bypass variables. Move into a `@media print` `:root` overrides block. - -**M-4.1.6 — MEDIUM — Breakpoints are scattered.** `540px`, `680px`, `900px`, `1100px`, `1500px` appear across files with no central definition. Define once in `base.css`: -```css -:root { - --bp-phone: 540px; - --bp-tablet: 680px; - --bp-desktop: 900px; - --bp-wide: 1500px; -} -``` -(Note: CSS variables cannot be used inside `@media` queries; use Sass or a preprocessor, or settle for a comment + grep discipline.) - -**L-4.1.7 — LOW — Inconsistent transition timings.** `0.15s`, `0.28s`, `0.3s`, `0.35s`, `0.5s` scattered. Three tokens would cover all cases. - -**L-4.1.8 — LOW — Deprecated `font-variant` shorthand.** `reading.css:95` and `library.css:22` use `font-variant: small-caps`, which resets other OpenType features (like kerning). Use `font-variant-caps: small-caps`. - -### 4.2 HTML templates - -**M-4.2.1 — MEDIUM — `templates/default.html:30-33` inline onload script.** The KaTeX bootstrap is an inline onload attribute containing a multi-line JS expression. Works, but blocks any future strict CSP (`unsafe-inline`). Move to an external `katex-bootstrap.js` served from `/js/`. - -**L-4.2.2 — LOW — `templates/partials/nav.html` buttons lack `type="button"`.** If any nav is ever placed inside a `
`, Enter will submit. Belt-and-suspenders fix: add `type="button"` to every `