**Question.** Among adult hospitalizations in a national claims database, does a deep learning model using ICD-10-CM diagnosis codes improve prediction of 30-day unplanned readmission and 30-day postdischarge in-hospital mortality compared with benchmark models based on Charlson and Elixhauser comorbidity indices?
**Findings.** In this cohort study of 3,226,831 temporally held-out discharges, the ICD-10-CM--based model showed better discrimination than benchmark comorbidity-index models for both outcomes.
**Meaning.** Using the full set of discharge diagnosis codes may improve short-term claims-based outcome prediction beyond summary comorbidity indices.
history:
- date: "2026-03-28"
note: Preprint auto-formatted for levineuwirth.org
---
::: {.annotation .annotation--static}
<divclass="annotation-header">
<spanclass="annotation-label">Summary</span>
<spanclass="annotation-name">Key points</span>
</div>
<divclass="annotation-body">
**Question.** Among adult hospitalizations in a national claims database, does a deep learning model using ICD-10-CM diagnosis codes improve prediction of 30-day unplanned readmission and 30-day postdischarge in-hospital mortality compared with benchmark models based on Charlson and Elixhauser comorbidity indices?
**Findings.** In this cohort study of 3,226,831 temporally held-out discharges, the ICD-10-CM--based model showed better discrimination than benchmark comorbidity-index models for both outcomes.
**Meaning.** Using the full set of discharge diagnosis codes may improve short-term claims-based outcome prediction beyond summary comorbidity indices.
</div>
:::
## Introduction (Background and Significance)
::: dropcap
@ -170,8 +178,6 @@ The analytic code (including the non-elective readmission implementation, model
The authors report no conflicts of interest related to this work.
::: aftermatter
## Tables
### Table 1: Performance metrics for the ICD model vs. CCI and ECI {#table-1}
@ -363,5 +369,3 @@ Results are shown in eTable 3, Panel C. ICD-only variants had slightly worse AUR
**(B)**

date: 2026-04-15 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
AI labs are likely deliberately reluctant to scale because they are aware that any imminient shift to locally run models as the norm would render their compute redundant. We take Anthropic as a principal case study to validate this hypothesis.
tags: # optional; see Tags section
- ai
- tech
- speculative
- open
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Working model" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 55 # 0–100 integer (%)
importance: 3 # 1–5 integer (rendered as filled/empty dots ●●●○○)
evidence: 1 # 1–5 integer (same)
scope: broad # personal | local | average | broad | civilizational
practicality: high # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
---
Running a lab that developers frontier LLMs is somewhat like playing a game that, by all measurable metrics external, you are bound to lose. The amount of compute required to train a frontier LLM is unbelievably expensive. The expense of inference is even more astronomical. OpenAI claims at the time of this writing to have somewhere between 900 Million and 1 Billion active users, all of whom require some amount of inference cost, and some small subset of whom consume an enormous amount of compute - to use their words, this is ["commercial scale."](https://openai.com/index/accelerating-the-next-phase-ai/). This isn't to mention the immense amount of competition - there are many major players in the United States alone contributing models that push the boundaries. OpenAI may have been the first, but Anthropic, Google, Meta, xAI, and, yes, even Amazon and Bytedance are following right along.
Then there's the news that the stock market doesn't want to hear. Ask yourself: who is deliberately left off the above list? If you're thinking of models like GLM, Qwen, MiniMax, and the notorious Deepseek, then we're on the same page. These models are rapidly approaching the capabilities of the frontier models that remain behind intrusive "competitive moats"^[This phrasing is adopted from Jared James Grogan's 2026 paper ["The End of the Foundation Model Era](https://arxiv.org/abs/2604.06217)] that do little more than violate the rights of their users. The advantages that such models provide are immense, and labs of the first list cannot ignore the likelihood of their precedence increasing in the weeks and months to come. In fact, I hypothesize that we are already seeing the reaction of frontier labs to these increasing capabilities, through the lense of juxtaposition: the jargon has remained constant, as if to negate any possibility of an "AI Bubble" bursting, but the quiet actions of the companies that aren't notoriously announced and decreed have shifted.
**Status:** LIVING DOCUMENT — Updated as implementation progresses.
---
## I. Vision & Philosophy
This website is an **intellectual home** — the permanent residence of a mind that moves freely between computer science, music composition, poetry, fiction, and whatever else catches fire.
### Commitments
1. **Long content over disposable content.** Essays are living documents.
2. **Semantic zoom.** Title → abstract → headers → body → sidenotes → citations → sources.
3. **Earned ornament.** Every decorative element serves a purpose.
4. **The site is the proof.** Entirely FOSS. No tracking. No analytics. No fingerprinting.
5. **Reader > Author.**
6. **Configuration is code.** The build system is a Haskell program.
7. **No homepage epigraph.**
8. **Extensible metadata.** Future-proofed for semantic embeddings via external JSON injection.
---
## II. All Resolved Decisions
### Typography
| Role | Font | License | Notes |
|------|------|---------|-------|
| **Body** | **Spectral** | SIL OFL | Screen-first serif. True smallcaps (`smcp`), four figure styles, ligatures, seven weights + italics. Self-hosted from source — Google Fonts strips OT features. |
- **Center column:** Body text in Spectral. 650–700px max-width.
- **Right margin:** Sidenotes only (right column).
### Color
Pure monochrome. No accent color. Light mode default (`#faf8f4` background, `#1a1a1a` text). Dark mode via `[data-theme="dark"]` + `prefers-color-scheme`.
### Content Systems
- **Tag system:** Hierarchical, slash-separated (`research/mathematics`). Hakyll `buildTags` + custom hierarchy. Tag pages at `/<tag>/` with no `/tags/` namespace prefix.
- **Pagination:** Blog index 20/page, tag pages 20/page. Essay index all on one page.
- **RSS:** Atom feed at `/feed.xml` (all content types, sorted by `date`) and `/music/feed.xml` (compositions only).
- **Citations:** Numbered superscript markers `[1]` linked to a bibliography section. Hover preview via `citations.js`. Further Reading section separate from cited works. `data/bibliography.bib` + Chicago Author-Date CSL.
- **Collapsible sections:** h2/h3 headings toggle their content via `collapse.js`. Smooth `max-height` transition. State persisted in `localStorage`.
### Gwern Codebase: Selective Adoption
| Component | Action | Actual outcome |
|-----------|--------|----------------|
| `sidenotes.js` | Adopt directly (Said Achmiz, MIT) | **Written from scratch** — purpose-built for our HTML structure |
| `popups.js` | Fork and simplify (Said Achmiz, MIT) | Exists in `static/js/popups.js`; Phase 3 |
Extensible YAML frontmatter. Hakyll strips frontmatter before passing to Pandoc, so all frontmatter access goes through Hakyll's metadata API (`lookupStringList`, `getMetadataField`, etc.), not through Pandoc `Meta`.
**Frontmatter keys in use:**
```yaml
title: # page title
date: # ISO date (YYYY-MM-DD) — used for sorting, feed, reading-time
abstract: # short description (1–3 sentences). Rendered via Pandoc to support LaTeX math and Markdown.
tags: # hierarchical tag list
authors: # list of author names (defaults to Levi Neuwirth)
further-reading: # list of BibTeX keys for the Further Reading section
bibliography: # path to .bib file (optional; defaults to data/bibliography.bib)
csl: # path to .csl file (optional; defaults to data/chicago-notes.csl)
repository: # external URL pointing to the content's source code or data repository
# Epistemic profile (all optional; section shown only if `status` is present)
status: # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: # 0–100 integer (%)
importance: # 1–5 integer (rendered as filled/empty dots)
evidence: # 1–5 integer (rendered as filled/empty dots)
scope: # personal | local | average | broad | civilizational
**`IGNORE.txt`:** A file in the project root listing content paths (one per line) whose `stability` and `last-reviewed` should not be recomputed. Cleared automatically after every `make build`. Useful for pinning manually-set stability labels on pages whose git history is misleading.
**Top metadata block:**
1. **Tags** — hierarchical tag list with links to tag index pages
2. **Description** — the `abstract` field, rendered via Pandoc (supporting LaTeX math and Markdown formatting), typically in italic
- [x] Pagination (blog index and tag pages, 20/page)
- [x] Metadata: YAML frontmatter + auto-computed word count / reading time
- [x] Single Atom feed (`/feed.xml`, all content, sorted by date)
- [x] External link icons (SVG mask-image, domain-classified via `Filters.Links`)
- [x] Gallery / Exhibit system (`gallery.js`, `gallery.css`) — added (not in original spec)
### Phase 3: Rich Interactions
- [x] Link preview popups (`popups.js`) — internal page previews (title, abstract, authors, tags, reading time), Wikipedia excerpts, citation previews; relative-URL fix for index pages
- [x] Pagefind search (`/search.html`) — `search.js` pre-fills from `?q=` param; `#search-timing` shows elapsed ms (mono, faint) via `MutationObserver` on search results subtree
- [x] Author system — authors treated as tags; `build/Authors.hs`; author pages at `/authors/{slug}/`; `authorLinksField` in all contexts; defaults to Levi Neuwirth
- [x] Settings panel — `settings.js` + `settings.css` section in `components.css`; theme, text size (3 steps), focus mode, reduce motion, print; all state in `localStorage`; `theme.js` restores all settings before first paint
- [x] Selection popup — `selection-popup.js` / `selection-popup.css`; context-aware toolbar appears 450 ms after text selection; see Implementation Notes
- [x] Fiction reading mode — same codex layout; `fictionCompiler`; chapter drop caps + smallcaps lead-in via `h2 + p::first-letter`; reading mode infrastructure shared with poetry
- [x] Music section — score fragment system (A): inline SVG excerpts (motifs, passages) integrated into the gallery/exhibit system; named, TOC-listed, focusable in the shared overlay alongside equations; authored via `{.score-fragment score-name="..." score-caption="..."}` fenced-div; SVG inlined at build time by `Filters.Score`; black fills/strokes replaced with `currentColor` for dark-mode; see Implementation Notes
- [x] Music section — composition landing pages + full score reader (C): two-URL architecture per composition; `/music/{slug}/` (rich prose landing page with movement list, audio players, inline score fragments) and `/music/{slug}/score/` (minimal dedicated reader); Hakyll `version "score-reader"` mechanism; `compositionCtx` with `slug`, `score-url`, `has-score`, `score-page-count`, `score-pages` list, `has-movements`, `movements` list (Aeson-parsed nested YAML); `score-reader-default.html` minimal shell; `score-reader.js` (page navigation, movement jumps, `?p=` deep linking, preloading, keyboard); `score-reader.css`; dark mode via `filter: invert(1)`; see Implementation Notes
- [x] Accessibility audit — skip link, TOC collapsed-link tabbing (`visibility: hidden`), section-toggle focus visibility, lightbox/gallery/settings focus restoration, popup `aria-hidden`, metadata nav wrapping, footer `onclick` removal; settings panel focus-steal bug fixed (focus only returns to toggle when it was inside the panel, preventing interference with text-selection popup)
- [~] Visualization pipeline — Pandoc filter approach (`Filters.Viz`): `.figure` fenced divs run `python3 <script>`, capture SVG stdout, inline with `currentColor` replacement; `.visualization` fenced divs embed Vega-Lite JSON in a `<script type="application/json" class="vega-spec">` tag rendered by `viz.js`; `viz: true` frontmatter gates CDN Vega/Vega-Lite/Vega-Embed + `viz.js`; dark mode re-renders via `MutationObserver`; `tools/viz_theme.py` provides matplotlib monochrome helpers. Infrastructure complete; not yet used in production content.
- [ ] Content migration — migrate existing essays, poems, fiction, and music landing pages from prior formats into `content/`
### Phase 5: Infrastructure & Advanced
- [x] **Arch Linux VPS + nginx + certbot + DNS migration** — Hetzner VPS provisioned, Arch Linux installed, nginx configured (config in §III), TLS cert via certbot, DNS migrated from DreamHost. `make deploy` pushes to GitHub and rsyncs to VPS.
- [x] **Semantic embedding pipeline** — Implemented. See Phase 6 "Embedding-powered similar links" and "Full-text semantic search".
- [x] **Backlinks with context** — Two-pass build-time system (`build/Backlinks.hs`). Pass 1: `version "links"` compiles each page lightly (wikilinks preprocessed, links + context extracted, serialised as JSON). Pass 2: `create ["data/backlinks.json"]` inverts the map. `backlinksField` in `essayCtx` / `postCtx` loads the JSON and renders `<details>`-collapsible per-entry lists. `popups.js` excludes `.backlink-source` links from the preview popup. Context paragraph uses `runPure . writeHtml5String` on the surrounding `Para` block. See Implementation Notes.
- [ ] **Link archiving** — For all external links in `data/bibliography.bib` and in page bodies, check availability and save snapshots (Wayback Machine `save` API or local archivebox instance). Store archive URLs in `data/link-archive.json`; `Filters.Links` injects `data-archive-url` attributes; `popups.js` falls back to the archive if the live URL returns 404.
- [ ] **Self-hosted git (Forgejo)** — Run Forgejo on the VPS. Mirror the build repo. Link from the colophon. Not essential; can remain on GitHub indefinitely.
- [ ] **Reader mode** — Distraction-free reading overlay: hides nav, TOC, sidenotes; widens the body column to ~70ch; activated via a keyboard shortcut or settings panel toggle. Distinct from focus mode (which affects the nav) — reader mode affects the content layout.
- [ ] **HTTP/3 + QUIC** — nginx 1.25+ supports HTTP/3 via `listen 443 quic reuseport` + `http3 on` + `Alt-Svc` header. Requires UDP 443 open in Hetzner's Cloud Firewall. Deferred: latency is currently geographic RTT, not server processing; gains would be modest for a static site from a single DC. CDN alternatives (Bunny.net, multi-region Hetzner with GeoDNS) would address the root cause but raise ethical or operational complexity concerns. Revisit if latency becomes a real user complaint. Server-side improvements (brotli pre-compression, `open_file_cache`) are a lower-cost step first.
### Phase 6: Deferred Features
- [x] **Annotation UI** — `annotations.js` / `annotations.css`: localStorage storage, text-stream re-anchoring, four highlight colors (amber/sage/steel/rose), hover tooltip with delete. Selection popup "Annotate" button triggers a color-swatch + optional note picker; Enter or "Highlight" button commits; Escape cancels. Picker positioned above the selection, same inverted style as the tooltip. Settings panel includes a "Clear Annotations" button (with confirmation) that wipes all annotations site-wide via `Annotations.clearAll()`.
- [~] **Visualization pipeline** — Implemented as a Pandoc IO filter (`Filters.Viz`), not a per-slug Hakyll rule. See Phase 4 entry and Implementation Notes. Infrastructure complete; production content pending.
- [x] **Music catalog page** — `/music/` index listing all compositions grouped by instrumentation category (orchestral → chamber → solo → vocal → choral → electronic → other), with an optional Featured section. Auto-generated from composition frontmatter by `build/Catalog.hs`; renders HTML in Haskell (same pattern as backlinks). Category, year, duration, instrumentation, and ◼/♫ indicators for score/recording availability. `content/music/index.md` provides prose intro + abstract. Template: `templates/music-catalog.html`. CSS: `static/css/catalog.css`. Context: `musicCatalogCtx` (provides `catalog: true` flag, `featured-works`, `has-featured`, `catalog-by-category`).
- [x] **Score reader swipe gestures** — `touchstart`/`touchend` listeners on `#score-reader-stage` with passive: true. Threshold: ≥ 50 px horizontal, <30pxverticaldrift.Leftswipe→nextpage;rightswipe→previouspage.
- [x] **Full-piece audio on composition pages** — `recording` frontmatter key (path relative to the composition directory). Rendered as a full-width `<audio>` player in `composition.html`, above the per-movement list. Styled via `.comp-recording` / `.comp-recording-audio` in `components.css`. Per-movement `<audio>` players and `.comp-btn` / `.comp-movement-*` styles also added in the same pass.
- [x] **RSS/feed improvements** — `/feed.xml` now includes compositions (`content/music/*/index.md`) alongside essays, posts, fiction, poetry. New `/music/feed.xml` (compositions only, `musicFeedConfig`). Compositions already had `"content"` snapshots saved by the landing-page rule; no compiler changes needed.
- [ ] **Pagefind improvements** — Currently a basic full-text search. Consider: sub-result excerpts, portal-scoped search filters, weighting by `importance` frontmatter field.
- [ ] **Audio essays / podcast feed** — Record readings of select essays. Embed a native `<audio>` player at the top of the essay page, activated by an `audio` frontmatter key (path to MP3, relative to the content dir). Generate a separate `/podcast.xml` Atom feed with `<enclosure>` elements pointing to the MP3s so readers can subscribe in any podcast app. Stretch goal: a paragraph-sync mode where the player emits `timeupdate` events that highlight the paragraph currently being read — requires a `data/audio/{slug}-timestamps.json` file mapping paragraph indices to timestamps, authored manually or via forced-alignment tooling (e.g. `whisper` with word timestamps).
- [x] **Build telemetry page** — `/build/` page generated at build time. `build/Stats.hs` loads all content items by type, reads `"word-count"` snapshots, aggregates counts/words/reading-time per type, computes word-length distribution (5 buckets), and reads top-15 tags from the `Tags` object. Makefile writes `date +%s` to `data/build-start.txt` before Hakyll runs; after pagefind, computes elapsed and writes `data/last-build-seconds.txt` (read on next build). CSS in `static/css/build.css` (flex bar chart, tabular-nums table, grid dl); loaded conditionally via `$if(build)$` in head.html.
- [x] **Epistemic profile** — Replaces the old `certainty` / `importance` fields with a richer multi-axis system. **Compact** (always visible in footer): status chip · confidence % · importance dots · evidence dots. **Expanded** (`<details>`): stability (auto) · scope · novelty · practicality · last reviewed · confidence trend. Auto-calculation in `build/Stability.hs` via `git log --follow`; `IGNORE.txt` pins overrides. See Metadata section and Implementation Notes for full schema and vocabulary.
- [ ] **Writing statistics dashboard** — A `/stats` page computed entirely at build time from the corpus. Contents: total word count across all content types, essay/post/poem count, words written per month rendered as a GitHub-style contribution heatmap (SVG generated by Haskell or a Python script), average and median essay length, longest essay, most-cited essay (by backlink count), tag distribution as a treemap, reading-time histogram, site growth over time (cumulative word count by date). All data collected during the Hakyll build from compiled items and their snapshots; serialized to `data/stats.json` and rendered into a dedicated `stats.html` template.
- [x] **Memento mori** — Implemented at `/memento-mori/` as a full standalone page. 90×52 grid of weeks anchored to birthday anniversaries (nested year/week loop via `setFullYear`; week 52 stretched to eve of next birthday to absorb 365th/366th days). Week popup shows dynamic day-count and locale-derived day names. Score fragment (bassoon, `content/memento-mori/scores/bsn.svg`) inlined via `Filters.Score`. Linked from footer (MM).
- [x] **Embedding-powered similar links** — `tools/embed.py` encodes every `#markdownBody` page with `all-MiniLM-L6-v2` (384 dims, unit-normalised), builds a FAISS `IndexFlatIP`, queries top-5 neighbours per page (cosine ≥ 0.30), writes `data/similar-links.json`. `build/SimilarLinks.hs` provides `similarLinksField` in `essayCtx`/`postCtx` with Hakyll dependency tracking; "Related" section rendered in `page-footer.html`. Staleness check skips re-embedding when JSON is newer than all HTML. Called by `make build` via `uv run`; non-fatal if `.venv` absent. See Implementation Notes.
- [x] **Bidirectional backlinks with context** — See Phase 5 above; implemented with full context-paragraph extraction. Merged with the Phase 5 stub.
- [x] **Signed pages / content integrity** — `make sign` (called by `make deploy`) runs `tools/sign-site.sh`: walks `_site/**/*.html`, produces a detached ASCII-armored `.sig` per page. Signing uses a dedicated Ed25519 subkey isolated in `~/.gnupg-signing/` (master `sec#` stub + `ssb` signing subkey). Passphrase cached 24 h in the signing agent via `tools/preset-signing-passphrase.sh` + `gpg-preset-passphrase`; `~/.gnupg-signing/gpg-agent.conf` sets `allow-preset-passphrase`. Footer "sig" link points to `$url$.sig`; hovering shows the ASCII armor via `popups.js``sigContent` provider. Public key at `static/gpg/pubkey.asc` → served at `/gpg/pubkey.asc`. Fingerprints: master `CD90AE96…B5C9663`; signing subkey `C9A42A6F…2707066` (keygrip `619844703EC398E70B0045D7150F08179CFEEFE3`). See Implementation Notes.
- [x] **Self-hosted semantic search model** — `tools/download-model.sh` fetches the quantized ONNX model (`all-MiniLM-L6-v2`, ~22 MB, 5 files) from HuggingFace into `static/models/all-MiniLM-L6-v2/` (gitignored). `semantic-search.js` sets `env.localModelPath = '/models/'` and `env.allowRemoteModels = false` before calling `pipeline()`, so all model weight fetches are same-origin. The CDN import of transformers.js itself still requires `cdn.jsdelivr.net` in `script-src`; `connect-src` stays `'self'`. `make download-model` is a one-time setup step per machine. See Implementation Notes.
- [x] **Responsive images (WebP)** — `tools/convert-images.sh` walks `static/` and `content/`, calls `cwebp -q 85` to produce `.webp` companions alongside every JPEG/PNG (skips existing; exits gracefully if `cwebp` absent). `make build` runs it before Hakyll so WebP files are present when `static/**` is copied. `build/Filters/Images.hs` detects local raster images and emits `RawInline "html"``<picture>` elements with a `<source srcset="…webp" type="image/webp">` and an `<img>` fallback; SVG, external URLs, and `data:` URIs pass through as plain `<img>`. Generated `.webp` files gitignored via `static/**/*.webp` / `content/**/*.webp`. See Implementation Notes.
- [x] **Full-text semantic search** — `tools/embed.py` also produces paragraph-level embeddings (same `all-MiniLM-L6-v2` model, same pass): walks `<p>/<li>/<blockquote>` in `#markdownBody`, tracks nearest preceding heading for context, writes `data/semantic-index.bin` (raw Float32, N×384) + `data/semantic-meta.json` ([{url, title, heading, excerpt}]). Client: `static/js/semantic-search.js` dynamically imports `@xenova/transformers@2` from CDN, embeds the query, brute-force cosine-ranks all paragraph vectors (fast at <5kparagraphsinJS),renderstop-8resultsastitle+sectionheading+excerpt.Surfacedon`/search.html`asa**Keyword / Semantic**tabstrip;activetabpersistsin`localStorage`(keyworddefaultonfirstvisit).Bothtabsonsamepage;`semantic-search.js`loadedwithinexisting`$if(search)$`block.SeeImplementationNotes.
---
## VI. Implementation Notes
### sidenotes.js — Written from scratch
The spec called for adopting Said Achmiz's `sidenotes.js` directly. Instead a purpose-built version was written for the `<span class="sidenote">` structure produced by `Filters.Sidenotes`. Features: JS collision avoidance (`positionSidenotes`), bidirectional hover highlight, click-to-focus (sticky highlight on wide viewport, anchor scroll fallback on narrow), document-click dismissal. `window.resize` is used as the reposition signal; `collapse.js` dispatches it after each section transition.
### Gallery / Exhibit system — Added (not in original spec)
- **Exhibits** (`.exhibit--equation`, `.exhibit--proof`): always-visible inline blocks with overlay zoom on click.
- **TOC integration**: exhibits are listed under their parent heading.
- Implementation: `gallery.js`, `gallery.css`; Pandoc fenced-div syntax (`:::`) to avoid the 4-space code block trap.
### LaTeX Math — Client-side KaTeX
The spec described pure build-time SSR. In practice: Pandoc outputs `class="math"` spans, KaTeX renders client-side from a deferred script. Fully static (no server per request). Revisit if build-time SSR becomes important.
*Note:* The `abstract` field is parsed natively through Pandoc via a custom `abstractField` compiler in `Contexts.hs` (using KaTeX settings) so that LaTeX math in the frontmatter renders identically to the body text.
### Citation pipeline — key subtleties
1. **`Cite` nodes, not `Span` nodes.** `processCitations` with `class="in-text"` CSL does *not* convert `Cite` nodes to `Span class="citation"` nodes in the Pandoc AST — it only populates their inline content and creates the refs div. The HTML writer wraps them in `<span class="citation">` at write time. Our `Citations.hs` must match `Cite` nodes directly.
2. **Hakyll strips YAML frontmatter.** Hakyll reads frontmatter separately; the body passed to `readPandocWith` has no YAML block, so Pandoc `Meta` is empty. `further-reading` keys and custom `bibliography` paths are read from Hakyll's metadata API (`lookupStringList` / `lookupString`) in `Compilers.hs` and passed explicitly to `Citations.applyCitations` to support custom per-essay `.bib` files (defaulting to `data/bibliography.bib`).
3. **`nocite` format.** Each further-reading key must be a *separate*`Cite` node with `AuthorInText` mode and non-empty fallback content — matching what pandoc produces from `"@key1 @key2"` in YAML. A single `Cite` node with multiple citations is not recognized by citeproc's nocite processing.
4. **`collectCiteOrder` queries blocks only**, not the full `Pandoc` (which includes metadata). Querying metadata would pick up the injected `nocite``Cite` nodes and incorrectly classify further-reading entries as inline citations.
### External link icons
Implemented via `data-link-icon` and `data-link-icon-type="svg"` attributes set by `Filters.Links`. CSS uses `mask-image: url(...)` with `background-color: currentColor` so icons inherit the text color and work in dark mode. Icons in `static/images/link-icons/` as SVG files.
### Tags — Hierarchical, no namespace
Tags are slash-separated (`research/mathematics`). A tag is auto-expanded into all ancestor prefixes so that `/research/` aggregates all `research/*` content. Tag pages live directly at `/<tag>/` with no `/tags/` namespace.
### Collapsible sections
`collapse.js` wraps each h2/h3's following siblings in a `.section-body` div and injects a `.section-toggle` button into the heading. State is persisted per heading in `localStorage` under `section-collapsed:<id>`. After each `transitionend`, dispatches `window.resize` to retrigger sidenote positioning. Headings themselves are never hidden, preserving `IntersectionObserver` targets for `toc.js`.
### Atom feed
`/feed.xml` covers all essays and blog posts (up to 30 most recent). A `"content"` snapshot is saved in `Site.hs`*before* template application, so the feed body is just the compiled article HTML (not the full page with nav/footer). Dates from the `date` frontmatter key, formatted as RFC 3339.
### Author system
Authors are treated as a second tag dimension. `build/Authors.hs` provides `buildAllAuthors` (a `buildTagsWith` call keyed to `authors` frontmatter) and `authorLinksField` (a `listFieldWith` context that defaults to `["Levi Neuwirth"]` when no `authors` key is present, so all unattributed content contributes to his author page). Author pages live at `/authors/{slug}/`. `slugify` lowercases and hyphenates; pipe-separated values (`"Name | role"`) strip the role portion via `nameOf`.
### Settings panel
`settings.js` manages four independent settings, all persisted in `localStorage`:
- **Theme** (`data-theme` on `<html>`): light / dark, with `syncThemeButtons()` toggling `.is-active`.
- **Text size**: three steps `[20, 23, 26]` px (small / default / large), written as `--text-size` CSS custom property on `<html>`. Default index is 1 (23 px).
- **Focus mode** (`data-focus-mode` on `<html>`): hides TOC, fades header to 7% opacity until hover.
- **Reduce motion** (`data-reduce-motion` on `<html>`): collapses all `animation-duration` / `transition-duration` to `0.01ms`.
`theme.js` (sync, not deferred) restores all four attributes from `localStorage` before first paint to avoid flash.
### Selection popup
`selection-popup.js` / `selection-popup.css`. A toolbar appears 450 ms after any text selection of ≥ 2 characters. Context is detected from the DOM ancestors of the selection range:
| **prose** (multi-word) | fallback | BibTeX · Copy · DuckDuckGo · Here · Wikipedia |
| **prose** (single word) | `!/\s/.test(text)` | BibTeX · Copy · Define · DuckDuckGo · Here · Wikipedia |
16 languages are mapped to documentation providers (MDN, Hoogle, docs.python.org, doc.rust-lang.org, etc.) via `DOC_PROVIDERS`. **BibTeX** generates a `@online{...}` BibLaTeX entry (key = `lastname + year + firstWord`; selected text in `note={\enquote{...}}`; year scraped from `#version-history li`). **Here** opens `/search.html?q=` in a new tab. **Define** opens English Wiktionary. Popup positions above the selection, flips below if insufficient space; hides on scroll, outside mousedown, or Escape.
### Reading mode (poetry + fiction)
Shared codex layout for creative content. `templates/reading.html` omits the TOC and emits a `<div id="reading-progress">` progress bar instead. `body.reading-mode` (set via `$if(reading)$` in `default.html`) triggers a slightly warmer background (`#fdf9f1` / `#1c1917`). Poetry pages (`body.reading-mode.poetry`) use a 52ch measure, 1.85 line-height, stanza paragraph spacing, and suppressed dropcap/smallcaps lead-in; `poetryCompiler` enables `Ext_hard_line_breaks` so each source newline becomes `<br>`. Fiction pages (`body.reading-mode.fiction`) use a 62ch measure, centered Fira Sans smallcaps chapter headings, and a dropcap + smallcaps lead-in on each `h2 + p`. Progress bar is driven by `reading.js` (scroll position → `width` on `#reading-progress`). CSS and JS loaded conditionally via `$if(reading)$`. Content goes in `content/poetry/*.md` and `content/fiction/*.md`; tags `poetry` / `fiction` route items to the correct portal and library section.
### Score fragment system (option A)
`Filters/Score.hs` walks the Pandoc AST for `Div` nodes with class `score-fragment`. It reads the referenced SVG from disk (path resolved relative to the source file's directory via `getResourceFilePath` + `takeDirectory`), replaces hardcoded black fill/stroke values with `currentColor` (6-digit before 3-digit to prevent partial matches on `#000` vs `#000000`), and emits a `RawBlock "html"``<figure>` carrying `class="score-fragment exhibit"`, `data-exhibit-name`, and `data-exhibit-type="score"` for gallery.js TOC integration. SVGs are inlined at build time and never served as separate files. `gallery.js` discovers `.score-fragment` elements via `discoverFocusableScores`, adds them to the shared `focusables[]` array with `type: 'score'`, and the overlay's `renderOverlay` branches on type — score path clones the SVG into the overlay body (no font-size loop); math path keeps the KaTeX re-render. Overlay body receives class `is-score` for tighter horizontal padding (`2rem 1.5rem` vs `3.5rem 4.5rem`). CSS: background rect removed via `svg > rect:first-child { fill: none }`, SVG responsive via `width: 100%; height: auto`, dark mode via `color: var(--text)`.
**Authoring syntax:**
```markdown
::: {.score-fragment score-name="Main Theme, mm. 1–8" score-caption="The opening gesture."}

:::
```
### Music — Composition landing pages + full score reader (option C)
**Implemented.** Two URLs per composition from one source directory.
The Hakyll `version "score-reader"` mechanism compiles the same `index.md` twice: once as the landing page (default version) and once as the reader (`customRoute` to `music/{slug}/score/index.html`). Score reader uses `makeItem ""` — the prose body is irrelevant; only frontmatter fields are needed.
#### Source directory layout
```
content/music/symphonic-dances/
├── index.md ← composition frontmatter + program notes prose
├── scores/
│ ├── page-01.svg ← one file per score page (LilyPond SVG output)
│ ├── page-02.svg
│ └── symphonic-dances.pdf
└── audio/
├── movement-1.mp3
└── movement-2.mp3
```
SVG, MP3, and PDF files are copied to `_site/` via `copyFileCompiler`. Score reader SVGs are served as separate `<img>` files — inlining a full orchestral score is impractical.
| `$score-pages$` | list | each item: `$score-page-url$` (absolute URL) |
| `$has-movements$` | boolean | present when `movements` non-empty |
| `$movements$` | list | each item: `$movement-name$`, `$movement-page$`, `$movement-duration$`, `$movement-audio$`, `$has-audio$` |
| `$composition$` | flag | `"true"` — gates `score-reader.css` in `head.html` |
`movements` is parsed from the nested YAML using `Data.Aeson.KeyMap` (Aeson 2.x API). `score-pages` are resolved to absolute URLs (`/music/{slug}/{path}`) inside the context so the `data-pages` attribute in the score reader template needs no further processing.
#### Score reader
The reader template embeds page URLs as a comma-separated `data-pages` attribute on `#score-reader-stage`. `score-reader.js` splits on commas and filters empties.
`score-reader-default.html` loads only: `base.css`, `components.css` (for settings panel styles), `score-reader.css`, `theme.js` (sync, pre-paint), `settings.js` (theme toggle in the top bar), `score-reader.js`. No nav, no TOC, no sidenotes, no popups, no gallery, no lightbox.
`score-reader.js` behaviors:
- `navigate(page)`: swaps `<img src>`, updates counter, toggles prev/next disabled states, updates active movement button (last movement whose start page ≤ current page), calls `history.replaceState` for `?p=` deep linking, preloads ±1 pages.
- Keyboard: `ArrowLeft`/`ArrowRight`/`ArrowUp`/`ArrowDown` for page turns; `Escape` → `history.back()`. Suppressed when settings panel is open.
- Dark mode: `[data-theme="dark"] .score-page { filter: invert(1); }` — clean for pure B&W notation; revisit if LilyPond embeds colored elements.
- Mobile: score scrolls horizontally at ≤ 640px (`min-width: 600px` on `<img>`); arrow buttons hidden; pinch-to-zoom is native.
#### Known limitations / future work
- **Full-piece audio**: a `recording` frontmatter key for a complete performance would add a top-level audio player on the landing page. Not yet implemented.
- **LilyPond margin cropping**: the `viewBox` drives scaling but LilyPond's default page includes margins. May need per-composition `viewBox` overrides or CSS `object-fit` once real scores are tested.
### Backlinks — Two-pass dependency-correct system
`build/Backlinks.hs`. The fundamental challenge: backlinks for page A require knowing what other pages link to A, but those pages haven't been compiled yet when A is compiled. Solved with a two-version architecture:
1. **Pass 1** (`version "links"`): each content file is compiled lightly — wikilinks preprocessed, Markdown parsed, AST walked block-by-block. For every internal link, the URL and the HTML of its surrounding `Para` block are recorded as a `LinkEntry { leUrl, leContext }`. Context rendered via `runPure (writeHtml5String opts (Pandoc nullMeta [Plain inlines]))` with `writerTemplate = Nothing` (fragment only). Result serialised as JSON per page.
2. **Pass 2** (`create ["data/backlinks.json"]`): loads all `version "links"` items, inverts the map (target → [source]), resolves each source's title and abstract from its metadata, emits `data/backlinks.json`.
3. **Context** (`backlinksField`): loads `data/backlinks.json` via `load`, looks up the current page's normalised URL, renders `<ul>` with `<details>`-collapsible context per entry.
**Key implementation details:**
- All `loadAll` / `loadAllSnapshots` / `buildTagsWith` / `buildPaginateWith` calls use `.&&. hasNoVersion` to prevent "links" version items from being picked up alongside default versions.
- `isPageLink` filters out `http://`, `https://`, `#`-anchors, `mailto:`, `tel:`, and static-asset extensions (`.pdf`, `.svg`, `.mp3`, etc.).
- JSON encoding uses `TL.unpack . TLE.decodeUtf8 . Aeson.encode` (not `LBSC.unpack`) to preserve non-ASCII characters in context paragraphs.
- `popups.js` excludes `.backlink-source` links from the internal-preview popup (same exception pattern as `.meta-authors`).
### Epistemic Profile
Implemented across `build/Stability.hs`, `build/Contexts.hs`, `templates/partials/page-footer.html`, `templates/partials/metadata.html`, and `static/css/components.css`.
**Context fields provided by `epistemicCtx`** (included in `essayCtx`):
| Field | Source | Notes |
|-------|--------|-------|
| `$status$` | frontmatter `status` | via `defaultContext` |
| `$confidence$` | frontmatter `confidence` | via `defaultContext` |
- Heuristic: ≤ 1 commits or age <14days→*volatile*;≤5commitsandage<90days→*revising*;≤15commitsorage<365days→*fairly stable*;≤30commitsorage<730days→*stable*;otherwise→*established*.
- `IGNORE.txt`: paths listed here use frontmatter `stability`/`last-reviewed` verbatim. Cleared by `> IGNORE.txt` in the Makefile's `build` target (one-shot pins).
**Critical implementation note:** Fields that use `unsafeCompiler` must return `Maybe` from the IO block and call `fail` in the `Compiler` monad afterward — not inside the `IO` action. Calling `fail` inside `unsafeCompiler`'s IO block throws an `IOError` that Hakyll's `$if()$` template evaluation does not catch as `NoResult`, causing the entire item compilation to error silently.
`build/Filters/Viz.hs` walks the AST for `Div` blocks with class `figure` or `visualization`.
**Static figures (`.figure`):** reads the `script` attribute, runs `python3 <script>` with the source file's directory as cwd, captures stdout as SVG. Replaces hardcoded `#000000`/`black` fills/strokes with `currentColor` (same trick as `Filters.Score`). Wraps in `<figure class="viz-figure">`. Script is expected to import `tools/viz_theme` and call `save_svg()` which writes to stdout.
**Interactive figures (`.visualization`):** runs the script, expects Vega-Lite JSON on stdout. Embeds as `<script type="application/json" class="vega-spec">` inside a `.vega-container` div. `viz.js` finds all `.vega-spec` scripts, stores parsed spec on `container._vegaSpec`, calls `vegaEmbed`. Always applies the site's monochrome Vega config, ignoring the spec's own `config`. MutationObserver on `document.documentElement[data-theme]` triggers `reRenderAll()` on theme change.
**Frontmatter:** `viz: true` gates CDN loading of Vega/Vega-Lite/Vega-Embed and `viz.js` in `head.html` via `$if(viz)$`.
**`tools/viz_theme.py`:** `apply_monochrome()` sets matplotlib rcParams (transparent backgrounds, black lines); `save_svg(fig)` writes SVG to stdout via `io.StringIO`; `LINESTYLE_CYCLE` provides dash-pattern sequences for multi-series charts (no color distinction needed).
**Key architecture:** master certifying key in `~/.gnupg` (passphrase-protected, used for email). Dedicated signing keyring at `~/.gnupg-signing/` holds: `sec#` (master stub, no secret) + `ssb` Ed25519 signing subkey (with secret). Correct isolation: `gpg --export-secret-subkeys "FINGERPRINT!"` exports only the subkey secret.
**Passphrase caching:** GPG 2.4's `passwd` in `--edit-key` requires the master secret to be present — it cannot change a subkey passphrase in a stub+subkey-only keyring. Instead, `gpg-preset-passphrase` (`/usr/lib/gnupg/gpg-preset-passphrase`) caches the passphrase by keygrip directly in the agent. `~/.gnupg-signing/gpg-agent.conf` sets `allow-preset-passphrase` and `max-cache-ttl 86400`. `tools/preset-signing-passphrase.sh` prompts via the terminal, calls `gpg-preset-passphrase --preset <keygrip>`. Must be run once per boot (or when the 24h cache expires).
**Popup preview:** `popups.js``sigContent` provider fetches the `.sig` URL (same-origin), renders the ASCII armor in a `<pre>` inside a `.popup-sig` div. Bound to `a.footer-sig-link` explicitly in `bindTargets`, bypassing the footer-exclusion guard on internal links. Result cached in the shared `cache` map.
**nginx:** `.sig` files need no special handling — they're served as static files alongside `.html`. The `try_files` directive handles `$uri` directly.
### Embedding pipeline + semantic search
**Model unification:** Both similar-links (page-level) and semantic search (paragraph-level) use `all-MiniLM-L6-v2` (384 dims). This is a deliberate simplification: the same model runs at build time (Python/sentence-transformers) and query time (browser/transformers.js `Xenova/all-MiniLM-L6-v2` quantized), guaranteeing that query vectors and corpus vectors are in the same embedding space.
**Build-time (`tools/embed.py`):** One HTML parse pass per file extracts both the full-page text (for similar-links) and individual paragraphs (for semantic search). Model is loaded once and both encoding jobs run sequentially. Outputs: `data/similar-links.json`, `data/semantic-index.bin` (raw `float32`, shape `[N_paragraphs, 384]`), `data/semantic-meta.json`. All three are gitignored (generated). Staleness check: skips the entire run if all three outputs are newer than all `_site/` HTML.
**Binary index format:** `para_vecs.tobytes()` writes a flat, little-endian `float32` array. In JS: `new Float32Array(arrayBuffer)`. No header, no framing — row `i` starts at byte offset `i × 384 × 4`. This is the simplest possible format and avoids a numpy/npy parser in the browser.
**Client-side search (`semantic-search.js`):** Dynamically imports `@xenova/transformers@2` from jsDelivr CDN on first query (lazy — no load cost on pages that never use semantic search). Fetches binary index + metadata JSON (also lazy, browser-cached). Brute-force dot product over a `Float32Array` in a tight JS loop — fast enough at <5kparagraphs;revisitwithaWASMFAISSbindingifthecorpusgrowsbeyond~20kparagraphs.Vectorsareunit-normalisedatbuildtime,sodotproduct =cosinesimilarity.
**Tab default + localStorage:** Keyword (Pagefind) is the default on first visit — zero cold-start. User's last-used tab is stored under `search-tab` in `localStorage` and restored on load, so returning users who prefer semantic always land there. If `localStorage` is unavailable (private browsing restrictions), falls back silently to keyword.
**Self-hosted model:** `semantic-search.js` sets `mod.env.localModelPath = '/models/'` and `mod.env.allowRemoteModels = false` immediately after the CDN import. transformers.js then resolves model files as `GET /models/all-MiniLM-L6-v2/{file}`, which are same-origin. `make download-model` (= `tools/download-model.sh`) fetches 5 files from HuggingFace: `config.json`, `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`, `onnx/model_quantized.onnx`. The files live in `static/models/` (gitignored) and are copied to `_site/models/` by the existing `static/**` Hakyll rule.
**CSP:** `script-src` requires `https://cdn.jsdelivr.net` for the transformers.js library import. `connect-src` stays `'self'` — all model weight fetches are same-origin after `allowRemoteModels = false`. The binary index and meta JSON are also same-origin.
### Responsive images — WebP `<picture>` wrapping
`build/Filters/Images.hs` inspects each `Image` inline's `src`. If it is a local raster (not starting with `http://`, `https://`, `//`, or `data:`; extension `.jpg`/`.jpeg`/`.png`/`.gif`), it emits `RawInline (Format "html")` containing a `<picture>` element:
The WebP `srcset` is computed at build time by `System.FilePath.replaceExtension`. No IO is needed in the filter — the `<source>` is always emitted; browsers silently ignore it if the file doesn't exist (falling back to `<img>`). SVG, external URLs, and `data:` URIs remain plain `<img>` tags. Images inside `<a>` links get no `data-lightbox` marker (same as before).
`tools/convert-images.sh` performs the actual conversion: `cwebp -q 85` per file. Runs before Hakyll in `make build` and is also available as a standalone `make convert-images` target. Exits 0 with a notice if `cwebp` is not installed, so the build never fails on machines without `libwebp`. Generated `.webp` files are gitignored; `git add -f` to commit an authored WebP.
**Quality note:** `-q 85` is a good default for photographic images. For pixel-art, diagrams, or images that are already highly compressed, `-lossless` or a higher quality setting may be appropriate (edit the script).
### Annotations (infrastructure only)
`annotations.js` stores annotations as JSON in `localStorage` under `site-annotations`, scoped per `location.pathname`. On `DOMContentLoaded`, `applyAll()` re-anchors saved annotations via a `TreeWalker` text-stream search (concatenates all text nodes in `#markdownBody`, finds exact match by index, builds a `Range`, wraps with `<mark>`). Cross-element ranges use `extractContents()` + `insertNode()` fallback. Four highlight colors (amber / sage / steel / rose) defined in `annotations.css` as `rgba` overlays with `box-decoration-break: clone`. Hover tooltip shows note, date, and delete button. Public API: `window.Annotations.add(text, color, note)` / `.remove(id)`. The selection-popup "Annotate" button is currently removed pending a UI revision.
---
## VII. Reference: Inspirations
- **gwern.net** — Primary model (Gwern Branwen + Said Achmiz). Semantic zoom, sidenotes, popups, monochrome, Pandoc+Hakyll.
- **Edward Tufte** — Sidenotes, information design
- **Matthew Butterick's Practical Typography** — Web typography in practice
- **Traditional book design** — The standard to aspire to on screen
---
*This specification is a living document updated as implementation progresses.*