740 lines
51 KiB
Markdown
740 lines
51 KiB
Markdown
# levineuwirth.org — Comprehensive Audit
|
||
|
||
**Auditor:** Independent code review (read-only, no changes made)
|
||
**Date:** 2026-04-09
|
||
**Scope:** ~15,400 lines across Haskell build system (`build/**/*.hs`), Pandoc filters (`build/Filters/*.hs`), static JavaScript (`static/js/*.js`), CSS (`static/css/*.css`), templates (`templates/**`), Python tooling (`tools/*.py`), shell scripts (`tools/*.sh`), `Makefile`, cabal/pyproject configuration, and repository hygiene.
|
||
**Methodology:** Direct reading of critical modules (`Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Filters/*.hs`, `Makefile`, shell scripts, `embed.py`); parallel exploration of JS, CSS, templates, and the larger Python tools.
|
||
|
||
Each finding is labeled by **severity** (`CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `NIT`) and cites file + line. The codebase is generally well-written — architecture is clean, modules are tightly scoped, YAML/frontmatter is parsed defensively, and escaping is applied in most HTML rendering sites. Most findings are local issues; the codebase does not exhibit systemic rot.
|
||
|
||
---
|
||
|
||
## Executive summary
|
||
|
||
**Confirmed correctness bugs (by impact):**
|
||
|
||
| # | File | Severity | Summary |
|
||
|---|------|----------|---------|
|
||
| 1 | `build/Filters/Images.hs:110` | **CRITICAL** | `lowerExt` is mathematically wrong — returns `"image."` for `"image.jpg"`. Every local raster fails `isLocalRaster`, so **no `<picture>` / WebP wrapping happens site-wide**. The entire WebP pipeline is dead code. |
|
||
| 2 | `build/Commonplace.hs:126-131` | **HIGH** | Operator-precedence bug in `renderChronoView`: `a ++ if c then x else y ++ z` parses as `a ++ (if c then x else (y ++ z))`, so `</div>` is never emitted when the commonplace book is empty → unclosed tag. |
|
||
| 3 | `tools/embed.py:68-73` | **HIGH** | Root `index.html` yields URL `"/./"` instead of `"/"`. Homepage is never matched by `SimilarLinks.hs`, so the "Related" block never renders on the home page. |
|
||
| 4 | `build/Authors.hs:50` | **HIGH** | `allContent` pattern does not include `content/essays/*/index.md` (directory-form essays). Author pages silently omit those essays. Compare against `Tags.hs:69`, which *does* include them. |
|
||
| 5 | `build/Filters/Score.hs:40` | **HIGH** | `TIO.readFile fullPath` is called with no existence check and no exception catch. A missing SVG aborts the entire build with a bare `openFile: does not exist` — no file name context, no graceful fallback. |
|
||
| 6 | `build/Filters/Viz.hs:96-99` | **HIGH** | Same pattern: `readProcessWithExitCode "python3" [fullPath]` runs even when `fullPath` doesn't exist; the only signal the author gets is a generic "non-zero exit". |
|
||
| 7 | `build/Filters/Sidenotes.hs:38` | **HIGH** | Sidenote labels wrap after the 26th note: `(n - 1) mod 26` turns note 27 into `a` again, creating duplicate `id="sn-a"` / `id="snref-a"` across the same document. Breaks in-page links and screen-readers. |
|
||
| 8 | `build/Filters/Images.hs:77` | **MEDIUM** | `passedKvs` filters only `loading` and `data-lightbox`, but not `id`, `class`, `alt`, or `title` — all of which are already emitted explicitly above. Any author-set `id=` or `class=` kv on an image is emitted **twice** in the `<img>`, producing invalid HTML (`<img … id="x" id="x">`). |
|
||
| 9 | `build/Contexts.hs:263-264` | **MEDIUM** | `confidenceTrendField` uses `xs !! (length xs - 2)` (O(n) indexing) and `last xs`. They are guarded by a length check so they're safe, but this is a partial idiom in a module that otherwise uses total patterns. |
|
||
| 10 | `build/Filters/Links.hs:59` | **MEDIUM** | `not ("levineuwirth.org" 'T.isInfixOf' url)` — substring match. `https://evil-levineuwirth.org.attacker.com` is classified as *internal*, skipping `rel=noopener noreferrer target=_blank`. |
|
||
|
||
**Defense-in-depth findings:**
|
||
|
||
- `build/Filters/Transclusion.hs:41` interpolates the author-controlled `sec` section name into a `data-section="..."` attribute with no escaping. In a static site where all Markdown is author-authored this is not an exploitable XSS, but it is a raw-HTML injection primitive — a stray `"` in a section name will break markup, and any future lowering of the "author is trusted" assumption (PRs, multi-author site, user submissions) turns it into one.
|
||
- `build/Stats.hs:161-169` implements a correct URL allowlist (`isSafeUrl`) but accepts `"/"` as a prefix, which also matches `//evil.com` (protocol-relative URLs). Mostly cosmetic here since inputs come from Hakyll-computed routes, but the allowlist comment claims strict defense and this is a hole.
|
||
- Two different `authorSlugify` / `nameOf` implementations exist (`Authors.hs:30-39` and `Contexts.hs:147-154`). They'll drift the moment one is edited.
|
||
- Five copies of `escHtml` — `Utils.hs:18-26` (the "real" one), `Filters/Images.hs:135-142`, `Filters/Score.hs:88-92`, `Filters/Smallcaps.hs` (per the filter audit), `Filters/Viz.hs:178-182`, plus identical ones in JS (`annotations.js`, `popups.js`, `semantic-search.js`). Any fix must be made in 7+ places.
|
||
|
||
**Repository hygiene:**
|
||
|
||
- `.env` is gitignored and not tracked — good.
|
||
- `~5.4 MB` of `.docx` binaries (`BeyondComorbidityIndices*.docx`) sit in the repo root, untracked but present; they're build input for the new essay but should be moved under `paper/` or similar rather than the project root.
|
||
- `HOMEPAGE.md~` (zero-byte editor backup) is on disk; gitignore catches it, but it should be removed.
|
||
- `content/modern_idolatry.md` is untracked and not under `content/drafts/` — either it's a ready-to-publish draft that escaped the drafts workflow, or a forgotten scratch file.
|
||
- `build/Metadata.hs` contains only `module Metadata where` — a no-op placeholder dragged along since Phase 2. Delete or populate.
|
||
- `build/Filters/Math.hs` and `build/Filters/Dropcaps.hs` are `apply = id` placeholders; fine as TODO anchors, but `-Wno-unused-imports` in `levineuwirth.cabal` is masking warnings that would otherwise tell you so.
|
||
|
||
---
|
||
|
||
## 1. Haskell build system (`build/*.hs`)
|
||
|
||
### 1.1 `Site.hs`
|
||
|
||
**L-1.1.1 — LOW — Blog posts do not support directory-form pages.** `content/blog/*.md` (line 249) only matches flat posts; compare to essays and poetry, which accept both flat and `*/index.md`. If the author ever wants to co-locate blog assets, they'll have to edit both the rule and `Backlinks.hs:allContent`.
|
||
|
||
**L-1.1.2 — LOW — Backlinks pattern drift.** `allContent` in `Backlinks.hs:200-208`, `Authors.hs:50`, `Tags.hs:69`, and the implicit patterns in `Site.hs` all enumerate the same content types, slightly differently. Authors omits directory essays; Backlinks omits fiction/*/index.md; Tags includes both essay forms but not fiction. This divergence is the root of finding #4 (Authors missing directory essays) and will continue to produce silent bugs. Extract one canonical `Patterns.hs`.
|
||
|
||
**L-1.1.3 — LOW — `draftEssays` → `isDev` ties build correctness to an environment variable read at rule registration.** `isDev <- preprocess $ ... lookupEnv "SITE_ENV"` runs once at startup. Correct — but a developer toggling `SITE_ENV` mid-`cabal run site -- watch` will be confused. Worth a comment at the `preprocess` call, not just near `draftEssays`.
|
||
|
||
**L-1.1.4 — LOW — `library.html` loads all content four times.** `portalList` calls `loadAll essays`, `loadAll posts`, `loadAll fiction`, `loadAll poetry` **inside the inner list body**, which is re-evaluated for each of the eight `portalList` calls. That's 32 `loadAll` calls for eight portals. Hakyll caches identifiers so the impact is bounded, but it's still unnecessary work; hoist the loads into the outer `compile` block.
|
||
|
||
**NIT — `random-pages.json` (line 445).** The type annotation `:: Compiler [Item String]` on every binding is load-bearing because without it Hakyll can't infer the snapshot type. Fine, but a quick comment would save a future reader from thinking they're decorative.
|
||
|
||
### 1.2 `Contexts.hs`
|
||
|
||
**M-1.2.1 — MEDIUM — `authorLinksField` produces empty-slug URLs for empty author names.** `authorLinksField` (line 161) splits on `|`, trims, and calls `authorSlugify`. An entry like `"| https://url"` or `" "` produces name `""` → slug `""` → URL `/authors//`. Guard against empty names (fall back to `defaultAuthor` or skip the entry).
|
||
|
||
**M-1.2.2 — MEDIUM — `parseMovements` silently drops malformed entries.** `parseMovements` (line 380-397) uses `catMaybes $ map parseOne` — an entry missing `name` or `page` is dropped with zero diagnostic. Compositions with a typo in one movement silently lose it. Add at least a `putStrLn` warning via `unsafeCompiler` or fail loudly.
|
||
|
||
**L-1.2.3 — LOW — `abstractField` only strips single-`Para` abstracts.** Line 184-186: `Pandoc m [Para ils] -> Pandoc m [Plain ils]`. An abstract with inline `<br>` or line breaks becomes multiple `Para` blocks and the outer `<p>` is not stripped. Harmless but inconsistent.
|
||
|
||
**L-1.2.4 — LOW — `confidenceTrendField` threshold of ±5 is undocumented.** Line 267-269: `c - p > 5` → up, `p - c > 5` → down. The comment in the header describes behavior but not the threshold. Magic number.
|
||
|
||
**L-1.2.5 — LOW — `pageScriptsField` uses the script path as the item identifier.** Line 123: `Item (fromFilePath s) s`. If two separate frontmatter entries both load `shared.js`, they collide in Hakyll's item-store the first time `listField` evaluates them. Probably works by accident because the inner `script-src` field just returns `itemBody`; note the risk.
|
||
|
||
**NIT — `getInt` via `Rational → Double → floor`** (line 396). If a page number is `1000000000000000000` (unlikely), Double precision loss. Use `Scientific.floatingOrInteger` from `scientific` (already transitively available via Aeson).
|
||
|
||
### 1.3 `Stats.hs`
|
||
|
||
**M-1.3.1 — MEDIUM — `stripHtmlTags` is naive.** Line 108-111 strips `<...>` greedily, ignoring `>` inside attribute values, `<!-- ... -->` comments, and `<![CDATA[...]]>`. Used to compute word count and reading time for the `/build/` page so the impact is limited, but if a future author writes `alt="a > b"` (rare but legal) it'll slice the content.
|
||
|
||
**M-1.3.2 — MEDIUM — `walkDir` has no symlink-loop protection.** Line 406-416 recurses through `_site` via `doesDirectoryExist`, which follows symlinks. A developer who accidentally symlinks `_site/a → _site` will infinite-loop the build. Use `doesDirectoryExist` + `pathIsSymbolicLink` (in `directory >= 1.3.6`).
|
||
|
||
**L-1.3.3 — LOW — `isSafeUrl` allows protocol-relative URLs.** Line 161-164 accepts `"/"`-prefixed values. `"//evil.com"` matches this prefix. All current inputs are Hakyll-derived routes so the exposure is nil, but the comment ("Defense-in-depth URL allowlist") claims more rigor than the implementation provides. Fix: reject `u` that begins with `//`.
|
||
|
||
**L-1.3.4 — LOW — `readFile`/`Aeson.decodeStrict` round-trip.** Line 741 decodes backlinks via `TE.encodeUtf8 (T.pack rawBL)` where `rawBL :: String`. That is `String → Text → ByteString` — three copies. Read the item as `Item ByteString` via `getResourceLBS` (or keep backlinks.json as bytes throughout) to avoid two conversions.
|
||
|
||
**L-1.3.5 — LOW — Two separate tag sections.** `renderStatsTags` (line 380) and `renderTagsSection` (line 568) are the same function with different names. Consolidate.
|
||
|
||
**L-1.3.6 — LOW — Lazy `readFile` in `countLinesDir`.** Line 455: `readFile (dir </> e)` holds the handle open until `length (lines content)` is fully forced. Under `forM`, multiple handles may be concurrently open. For a 30-file build directory it's fine; use `Data.Text.IO.readFile` for explicit strictness.
|
||
|
||
**NIT — `lookupString "title" meta` fallback `"(untitled)"`** (line 71 and many siblings). Fine, but consider extracting a `titleOr` helper since it appears ~6 times.
|
||
|
||
### 1.4 `Backlinks.hs`
|
||
|
||
**L-1.4.1 — LOW — `normaliseUrl` does not URL-decode.** Line 188-194: stripping `?` and `#` is done on the raw URL without percent-decoding. A path like `/essays/caf%C3%A9` won't normalize to `/essays/café`. Current build likely does not emit percent-encoded routes, so this is latent.
|
||
|
||
**L-1.4.2 — LOW — `backlinksField` does not handle the "item with noResult route" case explicitly.** When `getRoute item` is `Nothing`, it fails with `"backlinks: item has no route"`. Fine, but that path is unreachable for items that have an associated rule. Note it, remove if always reachable.
|
||
|
||
**NIT — `renderBacklinks` concatenates strings; use blaze-html** to match `Stats.hs`. Not urgent; the output is static per build.
|
||
|
||
### 1.5 `Citations.hs`
|
||
|
||
**L-1.5.1 — LOW — Partial functions in `transformInline`.** Line 142: `head keys` / `head nums`. Guarded by `null nums` check above and by the structure of Pandoc `Cite` (never empty from the parser), so this is safe in practice. Swap to `case nums of (n:_) -> ...`.
|
||
|
||
**L-1.5.2 — LOW — `markerHtml` concatenates `T.unpack . show` via `tshow`** but also builds `data-cite-keys` as a space-separated list of HTML IDs with no escaping. If a citation key contains a quote character (unusual but legal), the attribute breaks.
|
||
|
||
**NIT — `stripRefPrefix` (line 209)** is `"ref-"`-specific; should be renamed `stripPandocRefPrefix` or documented with a pointer to the Pandoc source that emits it.
|
||
|
||
### 1.6 `Compilers.hs`
|
||
|
||
**L-1.6.1 — LOW — `pageCompiler` does not save a `toc` snapshot.** OK for pages that use `pageCtx`, but the commonplace, landing, and standalone pages that would benefit from a TOC get no opportunity. Not a bug — an architectural choice worth documenting.
|
||
|
||
**NIT — `stringify`** is redefined here (line 56-77) in addition to `Filters/Images.hs:119-132` and the one `Text.Pandoc.Shared` exports. Three implementations. Pick one.
|
||
|
||
### 1.7 `Stability.hs`
|
||
|
||
**M-1.7.1 — MEDIUM — `readIgnore` uses lazy `readFile`.** Line 44: handle stays open until the whole list is forced. Fine for a single-shot read but the pattern is fragile; `Data.Text.IO.readFile` is strict.
|
||
|
||
**L-1.7.2 — LOW — `unsafeCompiler` for git subprocess breaks Hakyll's dep tracking.** `stabilityField` calls `git log` via `unsafeCompiler`. Hakyll will not re-run the compiler when HEAD moves. Expected — `make build` always runs `git add content/` + commit first, which updates mtimes — but it's fragile to reason about. Worth a note at the `unsafeCompiler` call site rather than the header docs.
|
||
|
||
**L-1.7.3 — LOW — `gitDates` ignores `stderr`.** Line 54: `(ec, out, _) <- readProcessWithExitCode ...` — `_` drops the error. If the file isn't tracked yet, git prints a warning to stderr; user sees nothing. Log it.
|
||
|
||
**NIT — `stabilityFromDates` classification is undocumented magic.** `n <= 5 && age < 90` → "revising". These thresholds should be constants with intent comments.
|
||
|
||
### 1.8 `Catalog.hs`
|
||
|
||
**M-1.8.1 — MEDIUM — `renderEntry` does not escape frontmatter.** `ceTitle`, `ceYear`, `ceDuration`, `ceInstrumentation`, and `ceUrl` are pasted directly into HTML via `concat`. This is consistent with the site's "author-controlled trusted HTML in titles" convention (`Stats.hs:180-186` calls this out explicitly), but `Catalog.hs` has *no such comment*. If a collaborator's frontmatter contains a stray `<` or a malformed entry, the HTML breaks silently.
|
||
|
||
Suggest: adopt the `pageLink` convention from `Stats.hs` — escape `href` via `safeHref`, pass title through `preEscapedToHtml` with a documented comment.
|
||
|
||
**L-1.8.2 — LOW — `renderCategorySection` assumes non-empty group.** Line 194: `categoryLabel (ceCategory (head g))`. `groupBy` on a non-empty list produces non-empty sublists, so this is safe, but partial.
|
||
|
||
**NIT — `categoryRank` uses `lookup` instead of `elemIndex`.** Shorter:
|
||
```haskell
|
||
categoryRank c = fromMaybe (length categoryOrder) (elemIndex c categoryOrder)
|
||
```
|
||
|
||
### 1.9 `Commonplace.hs`
|
||
|
||
**H-1.9.1 — HIGH — Operator-precedence bug in `renderChronoView` (line 126-131).**
|
||
```haskell
|
||
renderChronoView entries =
|
||
"<div class=\"cp-chrono\" id=\"cp-chrono\" hidden>"
|
||
++ if null sorted
|
||
then "<p class=\"cp-empty\">No entries yet.</p>"
|
||
else concatMap renderEntry sorted
|
||
++ "</div>"
|
||
```
|
||
|
||
Parses as `"..." ++ (if null sorted then "..." else (concatMap renderEntry sorted ++ "</div>"))`. When `sorted` is empty, the closing `</div>` is silently dropped. Fix: parenthesize the `if`, or split into two lines with explicit binding.
|
||
|
||
**L-1.9.2 — LOW — `renderText` replaces `\n` with `<br>\n`** after escaping, which is correct, but does not escape `\r`. Windows-style line endings would produce `\r<br>`, leaving stray `\r` in HTML. Normalize line endings in `stripTrailingNL`.
|
||
|
||
### 1.10 `Authors.hs`
|
||
|
||
**H-1.10.1 — HIGH — `allContent` omits directory-form essays.** Line 50:
|
||
```haskell
|
||
allContent = ("content/essays/*.md" .||. "content/blog/*.md") .&&. hasNoVersion
|
||
```
|
||
|
||
Compare to `Tags.hs:69`, which adds `"content/essays/*/index.md"`. Any essay stored as `content/essays/foo/index.md` will NOT appear on its author's index page. This is the most likely source of silent "why isn't this essay on my author page" bugs.
|
||
|
||
**L-1.10.2 — LOW — Duplicate of `Contexts.authorSlugify`.** `Authors.slugify` and `Contexts.authorSlugify` do the same thing with different definitions (the Contexts version normalizes before filtering, Authors version filters after lowercasing). The two will diverge on Unicode edge cases. Consolidate.
|
||
|
||
### 1.11 `Utils.hs`
|
||
|
||
**L-1.11.1 — LOW — `wordCount` counts HTML tokens as words.** Called from `Compilers.hs:172` on raw source `src` (Markdown, including any raw HTML) and from `Stats.hs:809` on tag-stripped HTML. On raw Markdown this miscounts `[display](url)` as three "words". Low-severity because the stat is approximate anyway, but worth noting when comparing `/stats/` numbers to `wc`.
|
||
|
||
### 1.12 `Pagination.hs`, `Tags.hs`, `SimilarLinks.hs`, `Metadata.hs`, `Main.hs`
|
||
|
||
No material issues. `Metadata.hs` is a two-line empty-module placeholder — delete or populate.
|
||
|
||
---
|
||
|
||
## 2. Pandoc filters (`build/Filters/*.hs`)
|
||
|
||
### 2.1 `Filters/Images.hs` — the big one
|
||
|
||
**C-2.1.1 — CRITICAL — `lowerExt` returns the basename, not the extension.** Line 110:
|
||
```haskell
|
||
lowerExt = map toLower . reverse . ('.' :) . takeWhile (/= '.') . tail . dropWhile (/= '.') . reverse
|
||
```
|
||
|
||
Trace for `"image.jpg"`:
|
||
1. `reverse` → `"gpj.egami"`
|
||
2. `dropWhile (/= '.')` → `".egami"`
|
||
3. `tail` → `"egami"`
|
||
4. `takeWhile (/= '.')` → `"egami"`
|
||
5. `('.' :)` → `".egami"`
|
||
6. `reverse` → `"image."`
|
||
7. `toLower` → `"image."`
|
||
|
||
So `lowerExt "image.jpg" == "image."` — which does not equal `.jpg`, `.jpeg`, `.png`, or `.gif`. **`isLocalRaster` is therefore `False` for every file**, the entire `<picture>`/WebP dispatch is dead code, and `tools/convert-images.sh` produces `.webp` companions that are never referenced.
|
||
|
||
Fix: `System.FilePath.takeExtension` is already imported elsewhere and already pulled in transitively; replace with
|
||
```haskell
|
||
lowerExt = map toLower . takeExtension
|
||
```
|
||
|
||
**M-2.1.2 — MEDIUM — `passedKvs` duplicate-emits `id`, `class`, `alt`, `title`.** Line 77:
|
||
```haskell
|
||
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox"]) kvs
|
||
```
|
||
|
||
But above, `attrId`, `attrClasses`, `attrAlt`, and `attrTitle` already emit those attributes from `(ident, classes, kvs)`. If an author writes `{.foo title="bar"}`, Pandoc places `title` into `kvs`, so the output becomes `<img ... class="foo" title="bar" title="bar">`. Expand the blacklist:
|
||
```haskell
|
||
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox", "id", "class", "alt", "title"]) kvs
|
||
```
|
||
|
||
Side-note: the same issue affects the non-picture branch at line 47 indirectly (via the `Image` constructor Pandoc emits), but Pandoc's HTML writer handles dedup there.
|
||
|
||
**M-2.1.3 — MEDIUM — `stringify` catches most but not all inline variants.** Line 119-132: handles `Str`, `Space`, `SoftBreak`, `LineBreak`, `Emph`, `Strong`, `Code`, `Link`, `Image`, `Span`. Misses `Strikeout`, `Superscript`, `Subscript`, `SmallCaps`, `Quoted`, `Cite`, `Math`, `RawInline`. Alt text for an image captioned `~subscript~` will be empty.
|
||
|
||
**L-2.1.4 — LOW — `renderKvs` does not escape the key.** Line 94: `" " <> k <> "=\"" <> esc v <> "\""`. Keys in Pandoc come from Markdown attribute syntax and can only be identifiers, so this is safe in practice; but it's asymmetric with `v` and deserves either `esc k` or an assertion comment.
|
||
|
||
**L-2.1.5 — LOW — `isUrl` misses `data:`, fine; misses `file://`, OK; misses `mailto:` not relevant here.** Accurate for the intended domain.
|
||
|
||
### 2.2 `Filters/Transclusion.hs`
|
||
|
||
**M-2.2.1 — MEDIUM — `sec` attribute not HTML-escaped.** Line 41:
|
||
```haskell
|
||
Just (slugToUrl slug, " data-section=\"" ++ sec ++ "\"")
|
||
```
|
||
|
||
`sec` is everything after `#` up to `}}` in the Markdown source. If an author writes `{{essay#a"b}}`, the emitted HTML is `<div … data-section="a"b">` — invalid markup. Not a realistic XSS vector on a single-author static site (would be a self-attack), but:
|
||
- It's an injection primitive. The moment content ever comes from a PR, a collaborator, or an imported source, it becomes one.
|
||
- The fix is one line: escape `"`, `<`, `>`, `&` before interpolation.
|
||
|
||
**L-2.2.2 — LOW — `slugToUrl` appends `.html` unconditionally.** Line 46-49: `slug ++ ".html"`. If the slug is already `page.html`, you get `page.html.html`. Unlikely in practice (source convention is `{{essay-slug}}` with no extension), but guard against it.
|
||
|
||
**NIT — `trim` re-implemented yet again.** Same function appears at least four times (`Transclusion.hs:59`, `EmbedPdf.hs:80`, `Wikilinks.hs:59`, plus `Contexts.hs`'s `strip`). Factor.
|
||
|
||
### 2.3 `Filters/Score.hs`
|
||
|
||
**H-2.3.1 — HIGH — `TIO.readFile fullPath` with no existence check and no exception handling.** Line 40. A Markdown file that references a missing SVG aborts the entire Hakyll build with nothing more than:
|
||
```
|
||
openFile: does not exist (No such file or directory)
|
||
```
|
||
|
||
No filename, no page context, no recovery. Fix:
|
||
```haskell
|
||
existed <- doesFileExist fullPath
|
||
if not existed
|
||
then do putStrLn $ "[Score] missing: " ++ fullPath
|
||
return (Div ("", cls, attrs) blocks)
|
||
else do svgRaw <- TIO.readFile fullPath
|
||
...
|
||
```
|
||
|
||
Or wrap in `try` and fall back to an `errorBlock` mirroring `Filters.Viz.errorBlock`.
|
||
|
||
**M-2.3.2 — MEDIUM — Lazy-I/O `readFile` under `walkM`.** Using `Data.Text.IO.readFile` forces immediately, so this is actually OK — I retract the generic concern. The real issue is #H-2.3.1 above.
|
||
|
||
**L-2.3.3 — LOW — `processColors` is order-sensitive.** The comment on line 56-58 acknowledges it: the 6-digit hex replacements come *last* in the function composition chain, which means they're applied *first*. That's correct and the comment is helpful. Keep the comment.
|
||
|
||
**L-2.3.4 — LOW — `escHtml` reorder bug.** Line 88-92:
|
||
```haskell
|
||
escHtml = T.replace "\"" """
|
||
. T.replace ">" ">"
|
||
. T.replace "<" "<"
|
||
. T.replace "&" "&"
|
||
```
|
||
|
||
`&` must be replaced *first*, else the `&` injected by other replacements gets its `&` replaced by `&` to become `&amp;`. Read bottom-up because of function composition: `&` → `<` → `>` → `"`. Wait — function composition: `f . g . h` applied to `x` is `f (g (h x))`. So the order executed is `&`, then `<`, then `>`, then `"`. **This is correct** (`&` first). Retracted — the `Viz.escHtml` at `Viz.hs:178-182` has the same composition order and is also correct. Nit only: write the function as a single chain with a comment stating the invariant.
|
||
|
||
### 2.4 `Filters/Viz.hs`
|
||
|
||
**H-2.4.1 — HIGH — No file-existence check before `readProcessWithExitCode`.** Line 96-99. Same class of bug as Score; the user sees `"non-zero exit"` with no path. Add `doesFileExist fullPath` before spawning.
|
||
|
||
**M-2.4.2 — MEDIUM — Exception handler drops the exception detail.** Line 99:
|
||
```haskell
|
||
`catch` (\e -> return (ExitFailure 1, "", show (e :: IOException)))
|
||
```
|
||
|
||
The third tuple element is set to `show e`, but then on line 102 the caller reads it as `err` and displays it. That's actually correct — retracted. BUT the error bubbles up to `errorBlock` which renders `<div class="viz-error">...</div>` inline in the page. That's actually graceful. Good.
|
||
|
||
**L-2.4.3 — LOW — `escScriptTag` only replaces `</`.** Line 133: correct for JSON embedding but not for content that contains `<!--` or `]]>` inside strings. Vega-Lite specs won't contain those, so fine.
|
||
|
||
**L-2.4.4 — LOW — `warn` uses `putStrLn` to stdout, not stderr.** Line 176. Mixes with Hakyll's build progress output. Use `hPutStrLn stderr`.
|
||
|
||
### 2.5 `Filters/Sidenotes.hs`
|
||
|
||
**H-2.5.1 — HIGH — Label wrap at 26 produces duplicate IDs.** Line 38:
|
||
```haskell
|
||
toLabel n = T.singleton (toEnum (fromEnum 'a' + (n - 1) `mod` 26))
|
||
```
|
||
|
||
Note 27 → `a` again. Two `<sup id="snref-a">` and two `<sup id="sn-a">` in the same document. Duplicate IDs are invalid HTML, break `href="#sn-a"` fragment navigation, and confuse ATs.
|
||
|
||
Fix options:
|
||
1. Use numeric labels: `"sn" ++ show n`.
|
||
2. Use two-letter labels for n > 26: `aa`, `ab`, …, `zz`.
|
||
3. Fail loudly with `error`: essays with >26 footnotes are rare and the user should know.
|
||
|
||
**M-2.5.2 — MEDIUM — `replacePTags` is a string-level hack.** Line 57-60:
|
||
```haskell
|
||
replacePTags =
|
||
T.replace "<p>" "<span class=\"sidenote-para\">"
|
||
. T.replace "</p>" "</span>"
|
||
```
|
||
|
||
A footnote whose content contains the literal text `<p>` (e.g., a code sample discussing HTML) will be mangled. Rare but possible. The correct fix is to transform the AST before writing, not the post-rendered HTML.
|
||
|
||
### 2.6 `Filters/Links.hs`
|
||
|
||
**M-2.6.1 — MEDIUM — `isExternal` uses substring match for the site domain.** Line 59:
|
||
```haskell
|
||
isExternal url =
|
||
("http://" `T.isPrefixOf` url || "https://" `T.isPrefixOf` url)
|
||
&& not ("levineuwirth.org" `T.isInfixOf` url)
|
||
```
|
||
|
||
`https://evil-levineuwirth.org.attacker.com/phish` contains `levineuwirth.org` as a substring → classified as *internal* → no `rel=noopener noreferrer target=_blank`. In 2026 with partitioned cookies this is mostly a cosmetic concern, but fix is trivial:
|
||
```haskell
|
||
isSameHost url =
|
||
case T.stripPrefix "https://" url <|> T.stripPrefix "http://" url of
|
||
Nothing -> False
|
||
Just rest ->
|
||
let host = T.takeWhile (\c -> c /= '/' && c /= ':') rest
|
||
in host == "levineuwirth.org" || "." `T.isSuffixOf` ("." <> host) -- etc.
|
||
```
|
||
|
||
or simpler: `host == "levineuwirth.org" || T.isSuffixOf ".levineuwirth.org" host`.
|
||
|
||
**M-2.6.2 — MEDIUM — PDF links with fragment are not rewritten.** Line 30-36 requires `.pdf" T.isSuffixOf` url` — a URL like `/papers/foo.pdf#page=5` has suffix `5`, not `.pdf`, so it doesn't route through the PDF.js viewer. Compare to `EmbedPdf.hs` which does handle fragments in the source preprocessor path. Inconsistent.
|
||
|
||
**L-2.6.3 — LOW — `domainIcon` duplicates twitter/x and youtube/youtu.be mappings.** Fine. Nit: table-driven via `lookup` would be cleaner than the chain of guards.
|
||
|
||
### 2.7 `Filters/Wikilinks.hs`
|
||
|
||
**M-2.7.1 — MEDIUM — `toMarkdownLink` does not escape `]` or `)`.** Line 33-36:
|
||
```haskell
|
||
toMarkdownLink inner =
|
||
let (title, display) = splitOnPipe inner
|
||
url = "/" ++ slugify title
|
||
in "[" ++ display ++ "](" ++ url ++ ")"
|
||
```
|
||
|
||
If the display text contains `]` or `)`, the generated Markdown is broken and Pandoc will parse it as raw text or as a weird link. Rare in practice (wikilink display is usually a plain name), but worth escaping.
|
||
|
||
**L-2.7.2 — LOW — `slugify` uses `intercalate "-" . words . ...` — "a.b" → "a b" → "a-b".** That's by design (punctuation becomes space becomes hyphen). Note the trailing hyphen for inputs like "end.": space after "end" → `["end"]` → "end". OK.
|
||
|
||
**NIT — Inefficient `trim` — `reverse . dropWhile ' ' . reverse . dropWhile ' '`.** Use `T.strip` if inputs were Text. `String`-based pipeline makes this unavoidable.
|
||
|
||
### 2.8 `Filters/EmbedPdf.hs`
|
||
|
||
**M-2.8.1 — MEDIUM — `encodeQueryValue` does not encode `#`.** Line 68-76: the encoder is called on `filePath`, which is already split on `#` by `parseDirective` (line 38). So the unencoded `#` issue doesn't bite here. However, the docstring at line 65 says "percent-encode characters that would break a query-string value" — `#` is such a character. Add it for defense in depth, even if the current call site doesn't benefit.
|
||
|
||
**L-2.8.2 — LOW — `parsePageHash` silently produces `""` for invalid fragments.** Line 45-51. An author writing `{{pdf:/foo.pdf#garbage}}` silently drops the fragment. No warning.
|
||
|
||
### 2.9 `Filters/Typography.hs`, `Filters/Code.hs`, `Filters/Smallcaps.hs`, `Filters/Dropcaps.hs`, `Filters/Math.hs`
|
||
|
||
Scanned via the parallel sub-audit; only nit-level findings apply (duplicate `escHtml`, smart-quote edge case in abbreviation matching, `apply = id` placeholders).
|
||
|
||
---
|
||
|
||
## 3. Static JavaScript (`static/js/*.js`)
|
||
|
||
Audited by parallel exploration. The full per-file list is long; the aggregate pattern is: **no user-authored content is ever injected**, so `innerHTML` usage across `popups.js`, `annotations.js`, `citations.js`, and `selection-popup.js` is **not an XSS vector under the current authoring model**. The risk profile changes the moment the site accepts PRs, gains an annotations-backend, or proxies third-party content (none of which are planned per `spec.md`).
|
||
|
||
### 3.1 XSS surface (all author-trust scoped)
|
||
|
||
**M-3.1.1 — MEDIUM — `popups.js:608-614` copies `innerHTML` from the page into the popup.** The `epistemicContent` provider does `html += '<div class="ep-compact">' + compact.innerHTML + '</div>'`. Because the source (`.ep-compact`) is emitted by our own Haskell code (`Contexts.hs` + templates), this is safe under the trust model. Switch to `compact.cloneNode(true)` + `popup.appendChild()` for a defense-in-depth fix that costs nothing.
|
||
|
||
**M-3.1.2 — MEDIUM — `popups.js` cross-origin fetches (Wikipedia, arXiv, CrossRef, GitHub, etc.) don't validate `Content-Type`.** A malicious CORS-enabled endpoint could return HTML that the popup would render. Every fetch already pipes through an `esc()` call (line 655-661), so the risk is bounded to text that escapes in some corner.
|
||
|
||
**L-3.1.3 — LOW — `citations.js:15, 56` and `annotations.js:167-172` use `innerHTML` with escaped data.** The escaping is correct; the fragility is that the escape-before-concat pattern is easy to get wrong in the future.
|
||
|
||
### 3.2 Event handling / lifecycles
|
||
|
||
**M-3.2.1 — MEDIUM — `sidenotes.js:73-94` attaches listeners per-sidenote with no cleanup path.** When `transclude.js` re-renders a fragment on resize, sidenotes accumulate duplicate handlers. Net effect: `update()` gets called 2×, 3×, … on hover over the same sidenote. Not a bug in the output, but a measurable leak over a long session.
|
||
|
||
**M-3.2.2 — MEDIUM — `popups.js` attaches listeners at load time and never re-binds for transcluded content.** A transcluded essay's internal links have no popup previews. If transclusion is meant to feel "live", this is a user-visible gap.
|
||
|
||
**M-3.2.3 — MEDIUM — `semantic-search.js:66-74` race in `loadModel`.** If two searches fire before the first model-load resolves, both call `import()` and `pipeline()`. Second call wastes CPU + memory. Track in-flight Promise:
|
||
```js
|
||
if (loadPromise) return loadPromise;
|
||
loadPromise = import(CDN).then(...);
|
||
```
|
||
|
||
### 3.3 Accessibility
|
||
|
||
**H-3.3.1 — HIGH — `gallery.js` overlay has no focus trap.** `openOverlay()` focuses the close button, but Tab escapes into the backdrop. Pattern to copy: `settings.js:35-49`.
|
||
|
||
**M-3.3.2 — MEDIUM — `selection-popup.js` annotation picker color swatches are mouse-only.** Arrow-key navigation + Enter to select would make it keyboard-accessible.
|
||
|
||
**M-3.3.3 — MEDIUM — `sidenotes.js` sidenote focus toggle is click-only.** No keyboard equivalent.
|
||
|
||
**L-3.3.4 — LOW — `lightbox.js:18,42` defaults `img.alt` to `""` and only later populates from source.** If source alt is missing, the lightbox image has no accessible name. Use `img.alt = srcAlt || 'Lightbox image'`.
|
||
|
||
**L-3.3.5 — LOW — `theme.js:9-28` does not `try/catch` around `localStorage.getItem`.** Private-browsing Safari throws. The code happens to work because `getItem` returns `null` on failure *in most browsers*, but not all.
|
||
|
||
### 3.4 Duplication and style
|
||
|
||
**L-3.4.1 — LOW — HTML escaping reimplemented 3× across `annotations.js`, `popups.js`, `semantic-search.js`.** Add a shared `utils.js` (one function).
|
||
|
||
**L-3.4.2 — LOW — Mixed `var` vs `const`/`let`.** `citations.js`, `nav.js`, `sidenotes.js`, `toc.js` use modern ES6+; `popups.js`, `annotations.js`, `gallery.js` use `var`. Pick one.
|
||
|
||
**NIT — Magic-number sprinkles** for delays (`SHOW_DELAY=250`, `HIDE_DELAY=150`, `SHOW_DELAY=450`, swipe threshold `30`, etc.). Not worth a refactor.
|
||
|
||
---
|
||
|
||
## 4. CSS and HTML templates
|
||
|
||
Audited by parallel exploration. Highlights:
|
||
|
||
### 4.1 CSS
|
||
|
||
**H-4.1.1 — HIGH — Undefined CSS custom properties.** `build.css` uses `--rule` (lines 21, 30, 39, 69) and `--bg-subtle` (components.css:1448) and `--font-ui` (many places) that have no definition in `base.css`. Browsers treat `var(--undefined)` as the initial value → silent visual degradation on the `/build/` and annotation-related pages.
|
||
|
||
Fix:
|
||
```css
|
||
:root {
|
||
--rule: var(--border-muted);
|
||
--font-ui: var(--font-sans);
|
||
--bg-subtle: #f5f5f5;
|
||
}
|
||
[data-theme="dark"] { --bg-subtle: #1f1f1f; }
|
||
```
|
||
|
||
**H-4.1.2 — HIGH — Dark-mode `--text-faint` contrast fails WCAG AA.** `#6a6660` on `#121212` ≈ 2.8:1. Used for sidenote numbers (0.65em!) and disabled-state icons. Bump to ~`#8b8680` (≈3.5:1) at minimum.
|
||
|
||
**H-4.1.3 — HIGH — TOC collapse hides content from keyboard + AT.** `components.css:433-436` uses `visibility: hidden` on collapsed TOC, which removes it from the accessibility tree. Use `aria-expanded` + height transition, or `aria-hidden="true"` explicitly, or `display: none` (losing the smooth collapse).
|
||
|
||
**H-4.1.4 — HIGH — No consistent `:focus-visible` ring across interactive elements.** `.nav-portal-toggle`, `.settings-toggle`, `.toc-toggle`, `.annotation-toggle` lack focus styles. Add a global:
|
||
```css
|
||
button:focus-visible, a:focus-visible {
|
||
outline: 2px solid var(--text);
|
||
outline-offset: 2px;
|
||
}
|
||
```
|
||
|
||
**M-4.1.5 — MEDIUM — Hardcoded hex in `print.css`.** `#fff`, `#000`, `#f9f9f9`, `#ddd` bypass variables. Move into a `@media print` `:root` overrides block.
|
||
|
||
**M-4.1.6 — MEDIUM — Breakpoints are scattered.** `540px`, `680px`, `900px`, `1100px`, `1500px` appear across files with no central definition. Define once in `base.css`:
|
||
```css
|
||
:root {
|
||
--bp-phone: 540px;
|
||
--bp-tablet: 680px;
|
||
--bp-desktop: 900px;
|
||
--bp-wide: 1500px;
|
||
}
|
||
```
|
||
(Note: CSS variables cannot be used inside `@media` queries; use Sass or a preprocessor, or settle for a comment + grep discipline.)
|
||
|
||
**L-4.1.7 — LOW — Inconsistent transition timings.** `0.15s`, `0.28s`, `0.3s`, `0.35s`, `0.5s` scattered. Three tokens would cover all cases.
|
||
|
||
**L-4.1.8 — LOW — Deprecated `font-variant` shorthand.** `reading.css:95` and `library.css:22` use `font-variant: small-caps`, which resets other OpenType features (like kerning). Use `font-variant-caps: small-caps`.
|
||
|
||
### 4.2 HTML templates
|
||
|
||
**M-4.2.1 — MEDIUM — `templates/default.html:30-33` inline onload script.** The KaTeX bootstrap is an inline onload attribute containing a multi-line JS expression. Works, but blocks any future strict CSP (`unsafe-inline`). Move to an external `katex-bootstrap.js` served from `/js/`.
|
||
|
||
**L-4.2.2 — LOW — `templates/partials/nav.html` buttons lack `type="button"`.** If any nav is ever placed inside a `<form>`, Enter will submit. Belt-and-suspenders fix: add `type="button"` to every `<button>` that isn't a submit.
|
||
|
||
**L-4.2.3 — LOW — `templates/partials/head.html` loads all component CSS unconditionally plus three conditional files.** Not a perf bug on HTTP/2, but `components.css` (1464 lines) is loaded even on the homepage. Split.
|
||
|
||
---
|
||
|
||
## 5. Python tooling (`tools/*.py`)
|
||
|
||
### 5.1 `tools/embed.py`
|
||
|
||
**H-5.1.1 — HIGH — Root URL becomes `"/./"`.** Line 68-73:
|
||
```python
|
||
def _url_from_path(html_path: Path) -> str:
|
||
rel = html_path.relative_to(SITE_DIR)
|
||
if rel.name == "index.html":
|
||
url = "/" + str(rel.parent) + "/"
|
||
return url.replace("//", "/")
|
||
return "/" + str(rel)
|
||
```
|
||
|
||
For `_site/index.html`, `rel.parent` is `Path(".")`. `str(Path("."))` is `"."` on Linux. Result: `"/./"`. Haskell's `SimilarLinks.normaliseUrl` produces `"/"` for the same route, so lookup fails and the homepage never gets similar-links suggestions.
|
||
|
||
Fix:
|
||
```python
|
||
if rel.name == "index.html":
|
||
parent = str(rel.parent)
|
||
if parent in (".", ""):
|
||
return "/"
|
||
return "/" + parent + "/"
|
||
```
|
||
|
||
**L-5.1.2 — LOW — No `--quiet` mode.** `embed.py` prints progress unconditionally; CI builds get noise.
|
||
|
||
**L-5.1.3 — LOW — `needs_update()` uses `rglob("*.html")` over `_site`.** Fine, but for a large `_site/` this re-stat's every HTML on every build. Could be cached via a single inode-level watermark file.
|
||
|
||
**NIT — `EXCLUDE_URLS` comparison against `/search/`, `/build/`, etc. works only because `_url_from_path` matches those exact forms.** A refactor could break the set. Document.
|
||
|
||
### 5.2 `tools/import-poetry.py`
|
||
|
||
**H-5.2.1 — HIGH — `yaml_str()` does not escape newlines.** Lines 193-203. An `abstract`, `attribution`, or first-line containing `\n` yields invalid YAML. Add `\n`, `\r` to the `needs_quote` character set.
|
||
|
||
**H-5.2.2 — HIGH — Empty `title_prefix` / `collection_slug` silently collide.** Line 328. If `--collection` is all punctuation, the slug becomes empty and every poem writes to the same path. Add an up-front assertion:
|
||
```python
|
||
if not collection_slug or collection_slug == "-":
|
||
sys.exit(f"error: collection slug is empty (check --collection={args.collection!r})")
|
||
```
|
||
|
||
**M-5.2.3 — MEDIUM — `--date` is unvalidated.** Line 313. User can pass `--date "last tuesday"` and it flows into YAML unchanged. Parse as `int` in `[1, 2100]`.
|
||
|
||
**M-5.2.4 — MEDIUM — `roman_to_int` has no explicit bounds check.** Line 45-52. Guarded by regex at the call site; fine today, but make the function defensive for its own protection.
|
||
|
||
**L-5.2.5 — LOW — `write_text(content, encoding="utf-8")` with no `errors=` argument.** Will raise on unmappable codepoints. Pick `errors="strict"` or `errors="replace"` intentionally.
|
||
|
||
### 5.3 `tools/viz_theme.py`
|
||
|
||
**L-5.3.1 — LOW — `save_svg` has no `try/finally` around `plt.close(fig)`.** If `savefig` raises, matplotlib state leaks. Standalone CLI-tool-y, so the impact is one figure.
|
||
|
||
### 5.4 `pyproject.toml` / `uv.lock`
|
||
|
||
**M-5.4.1 — MEDIUM — Upper-bound-free pins.** `torch>=2.5`, `sentence-transformers>=3.4`, `faiss-cpu>=1.9`, `numpy>=2.0`. A future major release can break the build silently. Pin with `<4` upper bounds.
|
||
|
||
---
|
||
|
||
## 6. Shell scripts and `Makefile`
|
||
|
||
### 6.1 `Makefile`
|
||
|
||
**M-6.1.1 — MEDIUM — `make deploy` pushes to GitHub *before* rsync.** Line 57-58:
|
||
```makefile
|
||
deploy: clean build sign
|
||
git push -u origin main
|
||
rsync -avz --delete _site/ $(VPS_USER)@$(VPS_HOST):$(VPS_PATH)/
|
||
```
|
||
|
||
If `rsync` fails, the GitHub push has already succeeded — the remote is ahead of the deployed site. Inverse order (rsync first, then push on success) would be safer, though `make` won't auto-rollback either way.
|
||
|
||
**M-6.1.2 — MEDIUM — `deploy` uses `$(VPS_USER)`, `$(VPS_HOST)`, `$(VPS_PATH)` with no definition in the Makefile.** They must come from `.env`. If any is unset, rsync runs as `@:/_site/` → silently opens an SSH connection to the wrong place or errors out obliquely. Add a guard:
|
||
```makefile
|
||
deploy: clean build sign
|
||
@test -n "$(VPS_USER)" || (echo "VPS_USER not set" >&2; exit 1)
|
||
@test -n "$(VPS_HOST)" || (echo "VPS_HOST not set" >&2; exit 1)
|
||
@test -n "$(VPS_PATH)" || (echo "VPS_PATH not set" >&2; exit 1)
|
||
...
|
||
```
|
||
|
||
**M-6.1.3 — MEDIUM — `build` commits content before building but never cleans up on failure.** Line 9-10:
|
||
```makefile
|
||
@git add content/
|
||
@git diff --cached --quiet || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||
```
|
||
|
||
If the subsequent `cabal run site -- build` fails, the commit is already in place. Subsequent `make build` retries will see a clean diff and succeed, masking the original failure in history as a "trailing" auto-commit. Low severity — the memory note about always `make clean && make build` + the `deploy: clean build` target make this infrequent.
|
||
|
||
**L-6.1.4 — LOW — `clean` only runs `cabal run site -- clean`.** That cleans `_site` and `_cache` (Hakyll's store), but not `dist-newstyle/` (cabal build output) or the embeddings under `data/` (gitignored but stale). Arguably correct-as-designed: deep clean is `git clean -fdX`. Document.
|
||
|
||
**L-6.1.5 — LOW — `pdf-thumbs` recipe uses unquoted `$$pdf` in `-nt` test.** If a PDF filename contains a space (rare), the test misparses. Quote:
|
||
```makefile
|
||
if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then
|
||
```
|
||
(The file path IS quoted — false alarm. Retracted.)
|
||
|
||
**L-6.1.6 — LOW — `export` on line 6 is blunt.** Every Make variable and every variable inherited from the shell becomes available to every recipe. For a solo build this is fine; be aware of scope creep.
|
||
|
||
**NIT — `build-start.txt` via file I/O instead of make variable.** Line 11, 23. A single recipe could use shell arithmetic, avoiding the scratch file (and the need to gitignore it).
|
||
|
||
### 6.2 `tools/sign-site.sh`
|
||
|
||
**L-6.2.1 — LOW — `find | xargs -I{}` with `-P $(nproc)` is vulnerable to pathological filenames.** The `-I{}` substitution plus `-0` is safe for spaces, but if `$(nproc)` returns `0` (cgroup edge cases), `-P 0` means "as many as possible", which is arguably fine but non-obvious. Explicit: `-P "${JOBS:-$(nproc)}"`.
|
||
|
||
**NIT — Hardcoded key fingerprint `C9A42A6FAD444FBE566FD738531BDC1CC2707066`.** Expected — but document how a key-rotation requires editing both this script and `preset-signing-passphrase.sh`.
|
||
|
||
### 6.3 `tools/convert-images.sh`
|
||
|
||
No issues of note. `set -euo pipefail`, `find -print0 | read -d ''`, quoting are all correct.
|
||
|
||
### 6.4 `tools/download-model.sh`
|
||
|
||
**L-6.4.1 — LOW — No checksum verification on downloaded ONNX.** Line 26 `curl -fsSL $BASE_URL/$src -o $dst`. If HuggingFace is compromised or returns a different build of the model, the site ships a trojan without warning. Pin expected SHA-256 and verify.
|
||
|
||
**NIT — Hardcoded HuggingFace URL.** Document that an official mirror is unavailable; that's why you pull from `resolve/main` rather than a pinned revision.
|
||
|
||
### 6.5 `tools/subset-fonts.sh`
|
||
|
||
**L-6.5.1 — LOW — Paths are Arch-specific.** `/usr/share/fonts/ttf-spectral`, `/usr/share/fonts/TTF`. On Debian/Ubuntu, JetBrains Mono lives at `/usr/share/fonts/truetype/jetbrains-mono/`. Detect via `fc-match` or document as Arch-only.
|
||
|
||
### 6.6 `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`
|
||
|
||
Clean. No material issues.
|
||
|
||
---
|
||
|
||
## 7. Repository hygiene and configuration
|
||
|
||
### 7.1 `.gitignore`
|
||
|
||
**L-7.1.1 — LOW — `.gitignore` lacks `*.swp`/`*.swo` for vim users, but has `*.swp`/`*.swo`.** OK. ✓
|
||
|
||
**L-7.1.2 — LOW — `dist-newstyle/`, `_site/`, `_cache/`, `.env`, `IGNORE.txt` all correctly ignored.** ✓
|
||
|
||
**NIT — `paper/` is tracked but its purpose is unclear.** Not in this audit's scope but worth a README.
|
||
|
||
### 7.2 Files in repo root that shouldn't be
|
||
|
||
- `BeyondComorbidityIndices.docx` (3.8 MB) + `BeyondComorbidityIndicesSupplement.docx` (1.6 MB) — untracked, but 5.4 MB of binary clutter at the project root. Move into `paper/` or `drafts/`.
|
||
- `HOMEPAGE.md~` — empty editor backup, gitignored but on disk.
|
||
- `HOMEPAGE.md`, `WRITING.md`, `migrate_html.md` — workspace notes without a home. Consider `docs/` or `notes/`.
|
||
- `content/modern_idolatry.md` — untracked Markdown file in `content/` that isn't `content/drafts/`. Either move under drafts or commit.
|
||
- `IGNORE.txt` — exists (empty), gitignored, used by the stability pin mechanism. Clean.
|
||
|
||
### 7.3 `levineuwirth.cabal`
|
||
|
||
**L-7.3.1 — LOW — `-Wno-unused-imports` masks real unused imports.** Set at the executable level. This hid the `Metadata.hs` no-op and the `Data.List.intercalate` in several modules. Delete the flag and fix the warnings.
|
||
|
||
**L-7.3.2 — LOW — Version bounds are present but `< 4.17` on Hakyll and `< 3.7` on Pandoc pin the project to a specific minor release window.** Good discipline, but document the refreeze cadence (there's `tools/refreeze.sh` — reference it in README).
|
||
|
||
**NIT — `bytestring < 0.13`** is ambitiously loose; the Pandoc ecosystem tends to follow `bytestring < 0.12`. Verify by running `cabal outdated --v2-freeze-file`.
|
||
|
||
### 7.4 `cabal.project`
|
||
|
||
`-O1` for the build program is the right call — Hakyll build time is dominated by Pandoc, not the wrapper. ✓
|
||
|
||
### 7.5 `pyproject.toml`
|
||
|
||
See finding M-5.4.1. Otherwise clean.
|
||
|
||
### 7.6 README
|
||
|
||
**M-7.6.1 — MEDIUM — `README.md` is a single line: `# levineuwirth.org`.** The project has a 63 KB `spec.md`, multiple build flows, optional features (.venv for embeddings, download-model for semantic search), a signing setup, and an rsync deployment target. None of this is documented. A new contributor (or future-you after a two-year hiatus) cannot get started from what's here.
|
||
|
||
Minimum viable README:
|
||
1. One-sentence description.
|
||
2. `make build`, `make dev`, `make deploy` entrypoints.
|
||
3. Optional: `.venv` setup via `uv sync` for embeddings; `make download-model` for client-side semantic search.
|
||
4. `.env` format (link to `.env.example`).
|
||
5. Pointer to `spec.md` for architecture.
|
||
|
||
---
|
||
|
||
## 8. Cross-cutting observations
|
||
|
||
### 8.1 Duplicate code
|
||
|
||
At least five independent implementations of HTML escaping (`Utils.hs`, `Images.hs`, `Score.hs`, `Smallcaps.hs`, `Viz.hs`, plus 3× in JS). At least four implementations of `trim` (`Transclusion`, `EmbedPdf`, `Wikilinks`, plus `Contexts.strip`). Two of `slugify`/`authorSlugify`. Two of `stringify` (`Compilers.hs`, `Images.hs`, plus `Text.Pandoc.Shared.stringify` in the library). Two `normaliseUrl` (`Backlinks.hs`, `SimilarLinks.hs`) — **almost** identical but with different `index.html` handling, so they cannot be naively merged.
|
||
|
||
Recommendation: create a `build/Common.hs` (or separate `build/Text.hs`) for `escapeHtml`, `trim`, `stringify`, and consolidate where possible.
|
||
|
||
### 8.2 Partial functions
|
||
|
||
Partial functions are used in several places with explicit guards (`last`, `head`, `!!`, `fromJust`). All are safe in their current guards, but the pattern is riskier than case-analysis. Audit: `Contexts.hs:263`, `Citations.hs:142`, `Stats.hs:125 (median)`, `Stability.hs:75`, `Catalog.hs:194`.
|
||
|
||
### 8.3 Error handling consistency
|
||
|
||
Two different patterns:
|
||
- `Score.hs` and older filters: `readFile` blows up, no diagnostic.
|
||
- `Viz.hs` and `Stability.hs`: `catch` + `errorBlock` / fallback.
|
||
|
||
Standardize on the second pattern across all IO-performing filters.
|
||
|
||
### 8.4 Trust boundary is unstated
|
||
|
||
The codebase leans on a "the author writes and reviews everything" assumption for:
|
||
- Frontmatter metadata (used raw in HTML by `Catalog.hs`, `Stats.hs`, `Contexts.hs`).
|
||
- Wikilink / transclusion slugs (used raw in HTML by `Transclusion.hs`).
|
||
- Bibliography entries (used raw in HTML by `Citations.hs`, `Backlinks.hs`).
|
||
|
||
This is a defensible design for a single-author site. Document it in `spec.md`. If the day ever comes that a PR from a collaborator is accepted, or that a user-provided input feeds any of these fields, the trust boundary needs to be revisited across all of these call sites simultaneously.
|
||
|
||
### 8.5 Build reproducibility
|
||
|
||
- Python dependencies are upper-bound-free (M-5.4.1).
|
||
- Model download (`tools/download-model.sh`) is unpinned by SHA (L-6.4.1).
|
||
- KaTeX and Vega are loaded from CDN (`templates/partials/head.html:26, 34-36`) without SRI hashes.
|
||
- Pandoc version is bounded (`>= 3.1 && < 3.7`), good — but citeproc behavior varies subtly across these.
|
||
|
||
Add SRI to CDN assets; pin the ONNX model to a specific revision + SHA; tighten Python pins.
|
||
|
||
### 8.6 Accessibility posture
|
||
|
||
Strong foundation (skip link, ARIA on nav, semantic elements, reduce-motion support), with localized gaps:
|
||
- Gallery overlay focus trap (H-3.3.1)
|
||
- Collapsed TOC (H-4.1.3)
|
||
- Dark-mode text-faint contrast (H-4.1.2)
|
||
- Keyboard-only equivalents for sidenote + annotation picker interactions
|
||
|
||
Addressing H-3.3.1, H-4.1.3, and H-4.1.2 alone would raise the overall a11y grade meaningfully.
|
||
|
||
---
|
||
|
||
## 9. Fix priority (recommended order)
|
||
|
||
### P0 — correctness blockers
|
||
1. `Filters/Images.hs:110` — `lowerExt` bug. One-line fix (`takeExtension`). Restores the entire WebP pipeline.
|
||
2. `Commonplace.hs:126-131` — parenthesize the `if`. One line.
|
||
3. `tools/embed.py:68-73` — fix root URL. Three lines.
|
||
4. `Authors.hs:50` — add `content/essays/*/index.md` to `allContent`.
|
||
5. `Filters/Sidenotes.hs:38` — numeric labels (or error on >26).
|
||
|
||
### P1 — silent-failure hardening
|
||
6. `Filters/Score.hs:40` — missing file handling.
|
||
7. `Filters/Viz.hs:96` — missing file handling.
|
||
8. `Filters/Images.hs:77` — dedup `passedKvs` blacklist.
|
||
9. `Filters/Links.hs:59` — proper hostname match.
|
||
10. `tools/import-poetry.py:193` — escape newlines in YAML strings.
|
||
|
||
### P2 — accessibility
|
||
11. Dark-mode `--text-faint` contrast.
|
||
12. Gallery focus trap.
|
||
13. TOC collapsed-state keyboard access.
|
||
14. Global `:focus-visible` styles.
|
||
|
||
### P3 — hygiene and refactor
|
||
15. Missing CSS variables (`--rule`, `--font-ui`, `--bg-subtle`).
|
||
16. Consolidate duplicate `escapeHtml`/`trim`/`stringify`.
|
||
17. `README.md` with actual contents.
|
||
18. Delete `build/Metadata.hs` or populate.
|
||
19. Remove `-Wno-unused-imports` from `levineuwirth.cabal` and fix what surfaces.
|
||
20. Relocate `.docx` binaries out of repo root.
|
||
|
||
### P4 — nice to have
|
||
21. Reproducibility: SRI on CDN, pinned ONNX, tightened Python bounds.
|
||
22. Consolidate `Backlinks`/`Authors`/`Tags`/`Site` content patterns into a single `Patterns.hs`.
|
||
23. Defense-in-depth escaping in `Transclusion.hs`, `Catalog.hs`.
|
||
24. `make deploy` guard for `VPS_*` variables.
|
||
|
||
---
|
||
|
||
## Appendix A — files scanned in full
|
||
|
||
- **Haskell (build system):** `Main.hs`, `Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Authors.hs`, `Tags.hs`, `Pagination.hs`, `SimilarLinks.hs`, `Utils.hs`, `Metadata.hs`, `Filters.hs`.
|
||
- **Haskell (filters):** `Filters/Images.hs`, `Filters/Transclusion.hs`, `Filters/Score.hs`, `Filters/Viz.hs`, `Filters/Sidenotes.hs`, `Filters/Links.hs`, `Filters/Wikilinks.hs`, `Filters/EmbedPdf.hs`. Others via parallel audit.
|
||
- **JavaScript:** 20 files under `static/js/` via parallel audit (prism.min.js excluded as vendor).
|
||
- **CSS:** 22 files under `static/css/` via parallel audit.
|
||
- **Templates:** `default.html`, `partials/head.html`, `partials/nav.html`, plus the full template tree via parallel audit.
|
||
- **Python:** `tools/embed.py`, plus `tools/import-poetry.py`, `tools/viz_theme.py` via parallel audit.
|
||
- **Shell:** `tools/convert-images.sh`, `tools/sign-site.sh`, `tools/download-model.sh`, `tools/subset-fonts.sh`, `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`, `Makefile`.
|
||
- **Config:** `levineuwirth.cabal`, `cabal.project`, `pyproject.toml`, `.gitignore`, `.env.example`.
|
||
|
||
## Appendix B — what was not audited
|
||
|
||
- `templates/partials/metadata.html`, `footer.html`, `page-footer.html`, `paginate-nav.html` — inspected briefly via the CSS/template sub-audit only.
|
||
- `static/css/build.css` — cited by the CSS audit for undefined variable usage; rules not fully traced.
|
||
- `data/*.bib`, `data/*.csl` — treated as data, not audited for CSL correctness.
|
||
- `content/**/*.md` — authored content, out of scope.
|
||
- `_site/`, `_cache/`, `dist-newstyle/`, `.venv/` — build outputs.
|
||
- `spec.md` — design document, referenced but not audited line-by-line.
|
||
- `prism.min.js`, `pagefind` output, KaTeX, Vega — vendor / third-party.
|
||
|
||
— End of audit —
|