Compare commits

..

23 Commits

Author SHA1 Message Date
Levi Neuwirth 5d344f940e Last audit stragglers: scaffolder, refreeze safety, atomic-write polish
- add-popup-source.sh: slug validated against ^[a-z0-9-]+$ before nginx
  interpolation; UPSTREAM_HOST derived unconditionally so the CSP
  reminder fires in the no-proxy case — which is exactly when the host
  must be added to connect-src (AUDIT §4.8)
- refreeze.sh: backs up the freeze and restores it on a failed resolve
  instead of leaving the repo with no freeze file (§4.9)
- einops gets the policy-mandated upper bound and a comment naming its
  consumer (nomic's remote modeling code) (§1.5)
- Makefile: pdftoppm failures warn instead of vanishing in the while
  pipeline; .NOTPARALLEL guards deploy's clean->build->sign ordering
  against -j invocations (§8.4)
- Atomic writers (embed, archive, the three sidecar extractors):
  PID-unique temp names so concurrent runs can't interleave, cleanup on
  failure everywhere, fsync where the artifact is not trivially
  regenerable (§4.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:43:14 -04:00
Levi Neuwirth 23bc2d0dc1 Frontend tail: keyboard access, idempotence, input edge cases
- gallery.js: math/score focus overlays are keyboard-activatable
  (role=button, tabindex, Enter/Space) and focus return on close lands
  on a focusable trigger (AUDIT §5.7)
- annotations.js: marks are focusable; Enter/Space pins the tooltip
  with focus moved to its Delete button, Escape dismisses — the delete
  affordance is finally reachable without a mouse (§5.7)
- transclude.js: nested transclusions resolve (depth-capped at 3, with
  ancestor-chain cycle rejection rendering the existing error style);
  collapse.js reinit is idempotent via data-collapse-bound (§5.7)
- copy.js excludes the button label from code-less <pre> copies;
  score-reader.js stops rewriting plain loads to ?p=1; search-filters
  treats non-numeric threshold input as inactive instead of a
  match-everything >=0 filter; selection-popup no longer re-summons
  the toolbar while typing capitals in the annotation picker (§5.8)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:25:19 -04:00
Levi Neuwirth 9f61ce5949 Tooling, manifest, and content polish
- import-photo.sh deletes the copied JPEG when EXIF stripping fails, so
  the auto-commit can never publish GPS/serial metadata (AUDIT §4.11)
- pre-commit-marks hook: tab-aware path parsing, probes the staged blob
  rather than the working tree (§4.11)
- preset-signing-passphrase uses printf; stamp-build-time writes via
  temp + os.replace; archive.py passes -- to pdftotext and verifies the
  vendored monolith binary against its recorded sha256 (mismatch is
  fatal, consistent with the tool's integrity contract); extract-exif
  ./-prefixes relative paths (§4.11)
- blog-post.html: id="similar-links"/"backlinks" each appear once;
  rendered output unchanged (§6.4)
- site.webmanifest: start_url/scope/description added, maskable icon
  purpose restored alongside any (§9.3)
- Frontmatter cleanup: scaffold comments out of scaling_outage,
  dangling null confidence-history keys removed (populated ones kept),
  dead modified: key dropped from colophon (§6.4)
- canto31.jpg: 4.0 MB -> 1.9 MB (2400px, q80, grayscale — the source
  is a monochrome Doré engraving, so single-channel is colorimetrically
  lossless); webp sidecar regenerated (§6.4, prior-audit §6.1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:34 -04:00
Levi Neuwirth 56afdb867a Feature modules: URL normalization, Maybe-trust, proper medians
- Empty/all-comments manifest.yaml is the empty archive, not a fatal
  parse error (AUDIT §3.11)
- Backlinks normaliseUrl strips index.html like SimilarLinks, so links
  to canonical directory URLs invert again; Stats normUrl updated in
  lockstep (§3.12)
- PDF viewer file= query value percent-encoded (hand-rolled RFC 3986
  encoder; network-uri is not a dependency) (§3.13)
- Photography feed thumbnails embed for flat singles and series
  children, not just directory entries (§3.14)
- Marks trust is Maybe Int: missing confidence/evidence collapses the
  figure to the bare frame as documented, instead of a literal
  "0 TRUST"; result-shape glyph centers when no score (§3.15)
- Unknown catalog categories fold into one Other bucket; medians take
  the mean of middle elements; protocol-relative URLs excluded from
  backlinks; @string/@comment/@preamble skipped in BibTeX parsing;
  watch-staleness of the once-per-process archive reads documented;
  stale comments fixed (§3.16, §3.9)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:34 -04:00
Levi Neuwirth f254ce866e Filters: fence/code-span awareness, host matching, nested-header skip
- SourceRefs trigger whitelist aligned to the /source/ serving
  whitelist (drops content/, yaml-source/, broad static//tools//data
  prefixes; adds .bib); existsCached no longer memoizes non-existence,
  so files created under make watch are picked up (§2.5, §2.16)
- fill/stroke hex replacement is boundary-aware: #000080 and 8-digit
  RGBA forms can no longer be corrupted into currentColor80 (§2.12)
- Wikilinks/Transclusion/EmbedPdf skip fenced code blocks (shared
  CommonMark fence tracker), and wikilinks additionally skip inline
  code spans — the syntax-documentation essay now renders its own
  examples literally while live wikilinks still convert (verified both
  ways in output) (§2.13)
- domainIcon matches the extracted host by label suffix instead of
  substring-of-URL; extractHost also strips userinfo (§2.14)
- webpSrc escaped in srcset; internal PDF links no longer double-
  classified; Smallcaps/Archive header-skip now holds at every nesting
  depth via protect/restore walks (§2.17)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:08 -04:00
Levi Neuwirth c8eeaaa9bc Core build cleanups: guards, pattern unification, noResult hygiene
- Library page no longer hard-depends on content/library.md; deleting
  it degrades to no intro block (AUDIT §2.8)
- primaryPortalOf accepts scalar comma-form tags via getTags, matching
  the tag system (§2.9)
- allContent gains me/ and memento-mori/ so their outgoing links join
  the backlinks graph; photography exclusion now documented (§2.10)
- Paginated tag pages partition AND sort by the same revision-aware
  display date — cross-page order is monotone again (§2.11)
- New stripPrefixRoute replaces gsubRoute at 17 call sites: prefix-only
  stripping, no mid-path mangling; route inventory verified identical
  (§2.15)
- random-pages uses canonical patterns (collection poems randomizable);
  pattern literals replaced with Patterns imports; duplicate local
  poetry patterns deleted; flat/collection poetry rules merged (§2.17)
- noResult instead of empty-list/fail for tagLinksField, dotsField,
  abstract/description/summary/bibliography/further-reading, plus the
  confidence-trend, overall-score, has-score, has-movements, and
  movement-audio fields — no more empty wrappers or [ERROR] log noise
  for legitimately-absent values (§2.17)
- tagItemCtx composes siteCtx, so monograms render on tag pages (§2.17)
- readingTime ceilings (399 words -> 2 min); authorSlugify comment
  fixed to match behavior, code untouched for URL stability; stale
  portal-count comments corrected (§2.17)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:08 -04:00
Levi Neuwirth 945086421a embed.py: hash-cache the paragraph pass; drop the dead mtime skip
The 'skip if outputs newer than every HTML' check could never fire:
stamp-build-time.py rewrites every page's footer AFTER embed.py runs,
so the comparison was always false and the full MiniLM paragraph pass
(and model load) ran on every build (AUDIT §4.3). Replaced with the
same content-hash cache the page pass already had — generalized
load/save_vec_cache, keyed by sha256 of the input text, invalidated on
model/revision/dim change. A no-change rerun now does no model loads:
measured 97s cold -> 4.8s warm.

Also strips section.footnotes from extraction: the new no-JS fallback
duplicates each sidenote's text at document end, which would double
footnotes in search results and skew page similarity.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:51:01 -04:00
Levi Neuwirth b2951c0c2c Branding diet: logo sprite via <use>, lean favicon.ico, simple mask icon
- The ~33 KB traced logo moves from an inlined-per-page partial to
  /logo-sprite.svg referenced with <use> — cached once instead of
  shipped on every page (homepage HTML: 46 KB -> 13 KB). CSS custom
  properties cascade into the use shadow tree, so the two-tone cutout
  is unchanged (AUDIT §9.1)
- favicon.ico regenerated at 16/32/48 from the 512px master: 71 KB ->
  15 KB; modern browsers take the SVG anyway, the .ico is the legacy
  fallback (§9.2)
- link-icons/internal.svg restored to the simple 4 KB path: it renders
  at 0.7-1.6 rem through a CSS mask, where the 33 KB traced detail
  cannot resolve (§9.2)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:43:06 -04:00
Levi Neuwirth aeb2937f7c Drafts are local-only: untrack the four committed ones
.gitignore has declared content/drafts/ local-only working notes since
the rule was added, but four drafts were already tracked — ignore rules
don't untrack, so make build's auto-commit kept staging and deploy kept
pushing them (AUDIT §6.3). Untracked with --cached; the files remain on
disk and still build in dev. Also moved inclusionist-manifesto.md into
drafts/essays/ where the draft rule actually matches it (§6.1), and
un-shadowed the tracked .env.example from the credential patterns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:40:05 -04:00
Levi Neuwirth 8ca22a45d2 Sidenotes: emit the section.footnotes fallback the CSS expects
The filter consumes every Pandoc Note, so the "standard Pandoc-
generated section.footnotes" its doc claimed as the no-JS fallback
never existed — below 1500px with JS disabled, footnote content was
simply invisible (AUDIT §2.3). The filter now collects consumed notes
and appends the section itself: letter labels, jump targets for the
in-text refs (which now point at the visible fallback item), and
doc-backlink returns. sidenotes.js pairs ref/note by element id and
preventDefaults clicks, so behavior with JS is unchanged.

Verified in output: per-page item count matches inline sidenote count;
refs target #fn-<label>; backlinks target #snref-<label>.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:37:28 -04:00
Levi Neuwirth 4e28c82e4c Fix SIMD essay repository URL: add missing owner segment
https://git.levineuwirth.org/where-simd-helps returned 404; the
owner-qualified form returns 200 (AUDIT §6.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:44:26 -04:00
Levi Neuwirth 8040be1aee Docs: align WRITING.md and README with the implementation
- js: page-script paths are site-root-relative, not content-relative
  (AUDIT §7.1)
- directory-form standalone pages need a dedicated Site.hs rule; flat
  content/<page>.md is the generic form (§7.2)
- portal table: add the missing Photography row (§7.3)
- document the implemented-but-undocumented summary:, revised:, and
  keywords: fields, including a Revision dates section (§7.4)
- default citation style is Chicago Notes Bibliography, not
  Author-Date; hover previews come from popups.js, not the deleted
  citations.js (§7.5)
- history: entries may be authored in any order (sorted at build
  time); examples reordered newest-first (§3.5)
- README: make watch runs Hakyll's live-reload preview server (§7.5)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth caa113e036 Frontend: search races, lightbox a11y, popup edge cases
- semantic-search.js: generation token prevents stale results from
  rendering over newer queries; in-flight dedup on the index fetch;
  index/meta size consistency check fails loudly instead of NaN
  ranking (AUDIT §5.5)
- lightbox.js: triggers keyboard-activatable (role=button, tabindex,
  Enter/Space); Tab trapped inside the aria-modal overlay, modeled on
  gallery.js (§5.6)
- nav.js: portal toggle persists via guarded safeStorage so
  storage-blocked contexts can't kill the toggle (§5.7)
- popups.js: provider url() throws (malformed percent-encoding) are
  treated as no-popup; future dates render nothing instead of
  "N days ago" (§5.7)
- search.js: missing PagefindUI degrades to a console warning instead
  of aborting the whole handler (§5.7)
- citations.js: deleted — dead code superseded by popups.js (§5.7)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth c17c203747 Tooling robustness: atomic writes, verified downloads
- archive.py: PROVENANCE.json / archive-index.json / archive-state.json
  now written atomically (tmp + os.replace) — a truncated integrity
  record is the one thing this tool must never produce (AUDIT §4.4);
  manifest entries validated as mappings up front (§4.7); refresh
  rejects provenance with a missing/empty artifact key instead of
  crashing on IsADirectoryError (§4.7); wayback save URL quotes
  unsafe characters (§4.7)
- download-leaflet.sh: existing files are re-verified before being
  skipped, and downloads land in a .part temp moved into place only
  after checksum verification — a failed verification can no longer
  leave a bad file that the next run silently accepts (§4.5)
- download-model.sh, convert-images.sh: same temp-then-move pattern so
  interrupted downloads/conversions never persist at final paths (§4.6)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth c68d03af31 Fix audit MEDs in feature modules
- Backlinks: handle Plain blocks (tight list items) and DefinitionList
  in link extraction — links in ordinary bullet lists were invisible to
  the backlinks system (AUDIT §3.3)
- Sidenotes: render note bodies with a KaTeX writer so footnote math
  reaches the client-side KaTeX pass instead of degrading to italics
  (§2.4)
- Archive: join manifest to provenance on normalised URLs like every
  other comparison in the system — an equivalent-form URL edit silently
  unpublished the page while links kept pointing at it (§3.6)
- Photography: flat singles get their basename as slug and root-level
  asset paths in map.json (§3.7); geo-precision now fails closed — an
  unrecognised value (typo'd "hidden") suppresses the pin instead of
  publishing rounded coordinates (§3.8)
- Stability: age is measured first-commit -> today, not the commit
  span, so quiet time stabilises a piece as documented (§3.4);
  history: entries are sorted newest-first by date regardless of
  authored order (§3.5); pinned pages format last-reviewed like the
  git branch (§3.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth 902e43ea19 Add /poetry/ and /fiction/ indexes; widen tag-collision guard
Nav, the home portal grid, and the library have linked both URLs since
the portals were added, but no rule generated either index — confirmed
404s in production (AUDIT §2.1). Both rules mirror the essays index;
fiction renders an empty list until content exists.

sectionOwnedTopLevelTags now lists every namespace owning a
<name>/index.html route, not just photography — Hakyll silently
overwrites on duplicate routes, so an essay tagged e.g. "music" would
have clobbered a real section landing (AUDIT §2.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:25:50 -04:00
Levi Neuwirth f11495ff9a Fix audit tooling/infra findings
- embed.py: pin nomic's auto_map modeling repo via code_revision —
  revision= alone left nomic-bert-2048 unpinned under
  trust_remote_code (AUDIT §1.3; verified loadable with
  HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
  cache so a half-written npz is discarded, not fatal (§4.2), and
  unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
  continuation in quoted strings, so the old value embedded literal
  backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
  site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
  and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
  unrelated work is no longer swept into auto: commits (§8.3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth c64f3d63c0 Fix audit frontend MEDs
- score-reader template: load utils.js before theme.js — without
  lnUtils.safeStorage the saved theme/text-size never restored on
  score pages (AUDIT §5.1)
- search-filters: expand trailing-slash pathnames to .../index.html
  before the epistemicMeta lookup; clean-URL pages were silently
  bypassing every active filter (AUDIT §5.2)
- viz: treat cappuccino as a dark theme so charts stop rendering
  near-black marks on a dark brown background (AUDIT §5.3)
- collapse: namespace section-collapsed keys by pathname (Pandoc
  auto-slugs recur across essays) and go through safeStorage like the
  rest of the site (AUDIT §5.4)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth 7ca937d98c Fix audit HIGHs/MEDs in build code
- ArchiveIndex: guard rawIndex/rawState with doesFileExist so a fresh
  clone (gitignored data/ JSONs absent) degrades to empty instead of
  crashing — the behavior the module doc already promised (AUDIT §1.2)
- Commonplace: decode YAML via encodeUtf8, not Char8.pack, which
  truncates codepoints above 0x7F (AUDIT §3.2)
- Stats: DayOfWeek is ISO-numbered (Mon=1..Sun=7); dowOf and weekStart
  assumed Mon=0..Sun=6, clipping every Sunday cell outside the heatmap
  viewBox and starting weeks on Sunday (AUDIT §3.1)
- Site: epistemicEntry now honors the proved/proven confidence sentinel
  like Contexts.overallScoreField (AUDIT §2.6)
- Contexts: affiliationField returns noResult instead of an empty list,
  so essays without affiliation no longer render an empty meta row
  (AUDIT §2.7)

Verified: full site build passes; proved page gets score=100 in
epistemic-meta.json; empty .meta-affiliation gone; heatmap rows
y=22..94 all inside the 104-high viewBox.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:30 -04:00
Levi Neuwirth 70ad44e9f4 Add 2026-06-09 repository audit findings
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 7c5354efa7 embed.py: split page vs paragraph embedding models
Pages (similar-links.json, build-only) move to nomic-embed-text-v1.5
(768d) with an on-disk npz cache; paragraphs (browser semantic search)
stay on all-MiniLM-L6-v2 (384d), so the client contract is unchanged.
WRITING.md search row updated accordingly. einops added for nomic's
remote modeling code; cache gitignored with a trailing glob so
interrupted-write debris is covered too.

Known follow-ups (AUDIT-2026-06-09.md §1.3, §4): pin the
nomic-bert-2048 remote code, catch BadZipFile in cache loads, fix the
staleness check defeated by stamp-build-time ordering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 37665f67db Branding: traced logo mark, regenerated favicons, og-image
New inline logo-mark.svg partial in the nav (two-tone cutout via
--logo-ink/--logo-bg), regenerated favicon set + web-app manifest icons
from the new mark, 1200x630 og-image wired into head.html.

Known follow-ups (AUDIT-2026-06-09.md §9): the traced SVG is ~33 KB
inlined per page, favicon.ico carries 128/256px entries, and the
webmanifest dropped its maskable purpose.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:34 -04:00
Levi Neuwirth a7b3b9cd07 Refreeze after system update: distributive 0.6.3 et al.
The pinned distributive 0.6.2.1 conflicted with the pacman package db
(comonad-5.0.10 built against 0.6.3), making a fresh solve impossible —
same failure mode as the 2026-05-07 audit's aeson pin. Regenerated via
tools/refreeze.sh; cabal build --dry-run now resolves.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:25 -04:00
93 changed files with 2919 additions and 1055 deletions

6
.gitignore vendored
View File

@ -10,6 +10,9 @@ _cache/
**/.env **/.env
**/.env.* **/.env.*
**/*.env **/*.env
# .env.example is documentation (tracked), not a credential file — the
# patterns above would otherwise shadow it for status/add purposes.
!.env.example
**/*.key **/*.key
**/*.pem **/*.pem
**/*.p12 **/*.p12
@ -73,6 +76,9 @@ data/build-stamp.txt
data/last-build-seconds.txt data/last-build-seconds.txt
data/semantic-index.bin data/semantic-index.bin
data/semantic-meta.json data/semantic-meta.json
# Both embed caches (pages + paragraphs); the trailing glob also
# catches interrupted-write debris (.tmp / .tmp.npz)
data/embed-cache-*
# Archive: generated text + its staleness stamp (recreated from the # Archive: generated text + its staleness stamp (recreated from the
# committed artifact on every build — deterministic, so committing them is # committed artifact on every build — deterministic, so committing them is

931
AUDIT-2026-06-09.md Normal file
View File

@ -0,0 +1,931 @@
---
title: Repository audit
date: 2026-06-09
---
# Repository audit — levineuwirth.org (2026-06-09)
Comprehensive audit of the repo on `main` at commit `620b974` (working tree
modified: branding refresh across `static/` + `templates/partials/`, plus
`tools/embed.py` rework; untracked `static/og-image.png`,
`templates/partials/logo-mark.svg`, `data/embed-cache-pages.npz.tmp.npz`).
Severity legend: **HIGH** (likely to break a build, cause data loss, or
expose a security weakness) — **MED** (latent bug, brittleness, or
documentation drift) — **LOW** (minor robustness gap or fragile assumption) —
**NIT** (style, polish, or paranoia).
Numbers are file:line against the working tree at audit time. Findings
marked "verified" were reproduced empirically (solver runs, built `_site/`
output inspection, live HTTP checks, binary parsing); the rest were
confirmed by reading the code.
Prior audit: `AUDIT.md` (2026-05-07). Follow-up status in §10.
---
## 1. Build & dependency chain
### 1.1 `cabal.project.freeze` is unsolvable again — next clean build fails — **HIGH**
`cabal build --dry-run` fails today (verified): the freeze pins
`distributive ==0.6.2.1`, but the system (pacman) GHC package db has
`comonad-5.0.10` built against `distributive-0.6.3`:
```
rejecting: distributive-0.6.3/installed... (constraint from
cabal.project.freeze requires ==0.6.2.1)
After searching the rest of the dependency tree exhaustively...
```
The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This
is the same failure mode as prior-audit §1.1 — that audit's specific aeson
pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke
the same way after a system update. Recent builds succeed only off the
cached `dist-newstyle/cache/plan.json`; the freeze file has since changed,
so the next cabal invocation re-solves and fails. Because `make deploy`
starts with `make clean`, the next deploy hits this. `levineuwirth.cabal`'s
own bounds are compatible with the freeze — the conflict is
freeze-vs-installed-db, not freeze-vs-cabal-file.
Fix: `tools/refreeze.sh` (written for exactly this post-`pacman -Syu`
situation). The underlying fragility — freezing against a mutable system
package db — remains; consider documenting the refreeze step as part of any
system-upgrade ritual. *(In progress at time of writing.)*
### 1.2 Missing `data/archive-index.json` / `archive-state.json` crashes the build — **HIGH**
`build/ArchiveIndex.hs:134-146`. The module doc (lines 18-22) promises "An
absent or malformed file degrades safely: an empty index makes the link
consumers no-op; an absent state file makes every entry @Live@." But
`rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath`
(and identically `rawState`) never checks `doesFileExist`, and aeson's
`eitherDecodeFileStrict'` throws an uncaught `IOException` on a missing
file (verified: `withBinaryFile: does not exist`). Both files are
gitignored (`.gitignore:84-85`), so a fresh clone or a no-`.venv` build —
the exact path `build/Archive.hs:20-24` promises to support — throws when
the CAF is first forced. Contrast `readUrlSet` (line 109) in the same file,
which guards correctly. Currently latent on this machine only because both
generated files happen to exist.
### 1.3 `embed.py` `trust_remote_code=True` executes unpinned third-party code — **HIGH**
`tools/embed.py:329` (line ~341 in the uncommitted version). The new
page-model load is
`SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True)`.
The `revision` arg pins only the `nomic-ai/nomic-embed-text-v1.5` repo; the
actual modeling code is pulled via `auto_map` from a *different* repo —
verified in the local HF cache: the executed code lives under
`transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/...`,
i.e. `nomic-ai/nomic-bert-2048` at its current head, which nothing pins. A
compromise of that second repo runs arbitrary Python at build time, in a
repo whose every other download path (download-model.sh, pdfjs, leaflet) is
sha256-pinned. The comment "Both pins are deliberate" is therefore
misleading. Fix: pin via `code_revision`, or run with `HF_HUB_OFFLINE=1`
after first fetch, or document the accepted risk.
### 1.4 Working-tree commit hazard: tracked templates reference untracked files — **HIGH (process)**
`templates/partials/nav.html:5` (tracked, modified) adds
`$partial("templates/partials/logo-mark.svg")$` and
`templates/partials/head.html` references `/og-image.png` — both target
files are **untracked** (no git history). Committing the template diff
without `git add`-ing both breaks every page's Hakyll build on a fresh
clone (`$partial$` aborts compilation) and 404s the og:image. They must
land in the same commit. Conversely, `data/embed-cache-pages.npz.tmp.npz`
must **not** be committed (see §4.1). The partial itself is safe as a
Hakyll template (verified: zero `$` characters; `match "templates/**"`
compiles it).
### 1.5 `einops` dependency: undocumented, unbounded, imported nowhere — **LOW**
`pyproject.toml:27` adds `einops>=0.8.2`. No import anywhere in
`tools/`/`build/`/`static/js/`; its only consumer is nomic's
`trust_remote_code` module (§1.3). Every sibling dependency has an
explanatory comment and an upper bound per the file's own stated policy
("Upper bounds are intentionally generous (next major) but always
present"); einops has neither. `uv lock --check` passes (0.8.2 pinned).
---
## 2. Haskell build code — core
### 2.1 Nav, home grid, and library link `/fiction/` and `/poetry/` — confirmed 404s — **MED**
`build/Site.hs:50-60` (`homePortals` contains `("Fiction","fiction")`,
`("Poetry","poetry")`), `templates/partials/nav.html:56,61`,
`templates/library.html:44,58`. No rule generates either index: fiction and
poetry are not in `tagIndexable` (`build/Patterns.hs:148-151` = essays +
blog + photos) and Site.hs has no landing rule. Verified: `_site/fiction`
does not exist; `_site/poetry/` has no `index.html`. nginx has no
redirects. Both links 404 in production today.
### 2.2 Tag/route collisions guarded for `photography` only — **MED**
`build/Tags.hs:98-99`. `tagIdentifier` maps tag `t``t ++ "/index.html"`;
`sectionOwnedTopLevelTags = ["photography"]` is the only guard. A
tagIndexable item tagged `music` (or `music/x`, which expands to `music`)
emits `music/index.html`, already owned by the music index route
(`build/Site.hs:486-487`); similarly `essays`, `blog`, `cv`, `archive`,
`authors`, `bibliography`. Hakyll does not error on duplicate routes — one
silently overwrites the other.
### 2.3 Sidenotes filter destroys the documented no-JS fallback — **MED**
`build/Filters/Sidenotes.hs:30-36` vs `static/css/sidenotes.css:125-135`.
The module doc claims the Pandoc `<section class="footnotes">` "serves as
fallback," but `apply` replaces every `Note`, so the writer never emits the
section. CSS depends on it below 1500px. Verified in output:
`_site/essays/scaling_outage.html` has 3 `class="sidenote"` and zero
`footnotes` occurrences. With JS disabled, footnote content is invisible on
narrow viewports. The comment, the CSS, and ozymandias.md's own prose all
contradict actual behavior.
### 2.4 Sidenote bodies rendered without the KaTeX writer — **MED**
`build/Filters/Sidenotes.hs:103-115`. `inlinesToHtml`/`blocksToHtml` use
`writeHtml5String (def :: WriterOptions)` (PlainMath), while the main
pipeline uses `KaTeX ""` (`build/Compilers.hs:47`). Math inside a footnote
never gets `<span class="math inline">\(...\)</span>`, so KaTeX never
renders it — degrades to plain italics, silently inconsistent with body
math.
### 2.5 SourceRefs whitelist vs `/source/` serving whitelist have drifted — **MED**
`build/Filters/SourceRefs.hs:114-141` vs `build/Site.hs:217-240`. Site.hs:209
says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps
`content/` and `yaml-source/` (no Site counterpart); `static/` + any known
ext vs Site's `static/js/**`/`static/css/**` only; `tools/` + any ext vs
Site's `tools/**.sh`/`tools/**.py`; `data/` at any depth vs Site's
top-level `data/*.{json,yaml,md,bib}`. Each mismatch yields a wrapped
source-ref whose popup fetch 404s (Forgejo href fallback still works).
Inverse: Site serves `data/*.bib` but `.bib` is missing from
`hasKnownExt` — dead whitelist entry.
### 2.6 `epistemicEntry` ignores `confidence: proved` — **MED**
`build/Site.hs:1014-1024`. Comment: "Compute overall-score the same way
Contexts.overallScoreField does," but it uses
`readMaybe =<< lookupString "confidence" meta`, which is `Nothing` for
`"proved"`/`"proven"`, whereas `Contexts.overallScoreField`
(`build/Contexts.hs:574-576`) substitutes 100 via `isProvedConfidence`.
Proved pages get no `score` in `data/epistemic-meta.json` and export the
raw string under `confidence`, so client-side filtering silently misses
them.
### 2.7 Empty affiliation `<div>` ships on every essay without `affiliation:` — **MED**
`build/Contexts.hs:84-89` + `templates/partials/metadata-tail.html:12`.
`affiliationField` returns an empty list instead of `noResult`; Hakyll's
`$if$` is truthy for empty list fields (the codebase knows this —
`tagLinksFieldExcludingScope` uses `noResult` for exactly this reason).
Verified in output: `_site/essays/asymmetric-forgetting.html` contains
`<div class="meta-row meta-affiliation">` with whitespace-only content.
### 2.8 Library page hard-depends on `content/library.md` — **LOW**
`build/Site.hs:675`. `_ <- loadSnapshot libraryIntroId "body"` is a
top-level compiler statement (not inside a `field`), so it's a hard
failure. The block is documented as "optional prose block"; deleting
`content/library.md` breaks the whole `library.html` compile. Contrast the
existence-guarded sidecars at `build/Tags.hs:277-283` and
`build/Site.hs:843-850`.
### 2.9 Library `primaryPortalOf` reads only list-form `tags:` — **LOW**
`build/Site.hs:632-638`. `lookupStringList "tags"` returns `Nothing` for
scalar comma form (`tags: research, ai`), which Hakyll's `getTags`
accepts. Such an item appears on tag pages but is silently dropped from
the library. All current content uses list form — latent.
### 2.10 `allContent` omits me/, memento-mori/, photography from the link graph — **LOW**
`build/Patterns.hs:124-133`, used by `build/Backlinks.hs:334,345`. Despite
"Every content file the backlinks pass should index," `content/me/index.md`
and `content/memento-mori/index.md` (full essays, rendered with
`backlinksField`) never have their outgoing links extracted; photography
likewise. Either deliberate-but-undocumented or the exact silent omission
the module header says it exists to prevent.
### 2.11 Paginated tag pages: split by creation date, sorted by display date — **LOW**
`build/Tags.hs:371-377`. `buildPaginateWith (sortAndGroupAt tagPageSize)`
partitions via `sortRecentFirst` (creation date), then each page re-sorts
with `recentFirstByDisplay` (revision-aware). A recently revised old item
stays on a late page but jumps to its top — cross-page ordering is not
monotone. Only fires above the 150-item threshold.
### 2.12 `fill:#000` replacement corrupts longer hex colors — **LOW**
`build/Filters/Score.hs:118-133` (and `Filters/Viz.hs` `processColors`).
The 6-digit pass protects only `#000000`; for `fill:#000080` the 3-digit
pass produces `fill:currentColor80` — invalid CSS, silently mangled SVG.
Quoted attribute forms are safe; only unquoted style-property forms are
exposed.
### 2.13 Source-level preprocessors rewrite inside fenced code blocks — **LOW**
`build/Filters/Wikilinks.hs:24-31`, `Filters/Transclusion.hs:18-20`,
`Filters/EmbedPdf.hs`. All run on the raw source before Pandoc parses
fences: `[[anything]]` in a code block becomes a link; a code-block line
that is exactly `{{slug}}` or `{{pdf:...}}` becomes raw HTML.
Transclusion's comment ("prevents accidental substitution inside prose or
code") is false for full-line directives in code blocks. A live foot-gun
for a site that documents its own syntax (ozymandias.md does exactly
this).
### 2.14 `domainIcon` matches substrings of the whole URL, not the host — **LOW**
`build/Filters/Links.hs:120-153`. `"x.com" `T.isInfixOf` url` etc. —
`https://example.org/why-x.com-failed` gets the Twitter icon. Contradicts
the strict-hostname discipline `isExternal` documents at lines 95-101 of
the same file. Cosmetic (icon only).
### 2.15 `gsubRoute "content/"` strips every occurrence, not just the prefix — **LOW**
`build/Site.hs:171,357,417` etc. Hakyll's `gsubRoute` is replace-all; a
co-located directory literally named `content` would be silently mangled
(`content/essays/slug/content/data.csv` → `essays/slug/data.csv`). Same
for `gsubRoute "static/"`. Improbable but silent.
### 2.16 `existsCached` memoizes non-existence for the process lifetime — **LOW**
`build/Filters/SourceRefs.hs:160-166`. Under `make watch`, a source file
created after first reference stays cached as absent until restart.
### 2.17 Core NITs
- `build/Site.hs:42-44`: comment says "eight portals"; the list has nine.
Echoed at Site.hs:606 ("the eight") vs line 657's "nine times".
- `build/Site.hs:866-877`: random-pages.json comment says "essays + blog
posts only" but the rule loads fiction and flat poetry too; uses
flat-only `content/poetry/*.md` while the epistemic rule uses
`allPoetry` — collection poems are epistemic-indexed but never
randomizable.
- `build/Utils.hs:64-73`: `authorSlugify` comment claims runs of spaces
collapse; code maps each space (`"A B"` → `"a--b"`). Consistent
everywhere, so links work; comment wrong.
- `build/Utils.hs:31-32`: `readingTime` truncates (`div 200`) — 399 words
reports "1 min"; comment implies ceiling semantics.
- `build/Pagination.hs:42` + `build/Site.hs:77-82`: hardcoded pattern
literals duplicate `Patterns.hs`, defeating that module's stated purpose
(Patterns.hs:6-10).
- `build/Contexts.hs:174-180`: plain `tagLinksField` returns an empty list
rather than `noResult``$if(item-tags)$` is true and templates emit
empty tag wrappers (author-index.html, item-card.html).
- `build/Tags.hs:296-304`: `tagItemCtx` composes `defaultContext`, not
`siteCtx`, so `$if(has-monogram)$` never fires on tag pages — monograms
render on new.html/library but silently never on tag indexes.
- `build/Contexts.hs:485-492`: `dotsField` comment says "15" but accepts
0 (`max 0 (min 5 n)`) — `importance: 0` renders five empty circles.
- `build/Contexts.hs:375-381`: `descriptionField` doc says `noResult`;
code uses `fail` — behaviorally fine under Hakyll 4.16 `$if$` (verified
against Hakyll 4.16.7.1 source) but logs `[ERROR]` debug noise per
abstract-less page. Same in `abstractField`, `summaryField`,
`bibliographyField`.
- `build/Filters/Images.hs:233-234`: `webpSrc` interpolated into `srcset`
unescaped while sibling `src` goes through `esc`.
- `build/Filters/Links.hs:37-46,63-69`: internal PDF links double-classified
(`pdf-link` + `link-internal` chrome) despite the "no overlap" comment.
- `build/Filters/Smallcaps.hs:31-34` + `Filters/Archive.hs:42-44`:
"headers are skipped" only at top level; a Header nested in a
Div/BlockQuote is processed, contradicting the comments.
Verified clean: no unguarded `head`/`fromJust`/`read`/`!!` hazards in the
core modules; filter composition order matches its documenting comments;
Hakyll 4.16.7.1 `$if$` treats both `fail` and `noResult` as false.
---
## 3. Haskell build code — feature modules
### 3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — **MED**
`build/Stats.hs:185,300,317`. `dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6`
— but `time-1.12.2` is ISO-numbered (verified:
`map fromEnum [Monday..Sunday] == [1..7]`). So Sunday lands at y=106 while
`svgH` = 104 — every Sunday cell is clipped out of the viewBox and grid
row 0 is permanently blank. Relatedly, `weekStart` returns the previous
*Sunday* (and for a Sunday, 7 days back), not the "first Monday on or
before" its comment claims; builds run on a Sunday also clip the newest
column horizontally.
### 3.2 `Commonplace.hs` uses `Char8.pack` — non-ASCII YAML corruption — **MED**
`build/Commonplace.hs:143`. `Y.decodeEither' (BS.pack raw)` with
`Data.ByteString.Char8` truncates each `Char` to 8 bits — the exact hazard
`build/Now.hs:249-253` documents and fixes with `TE.encodeUtf8`.
`data/commonplace.yaml` is currently pure ASCII, so latent — but a
commonplace book of quotations is the likeliest file to acquire an em-dash
or curly quote, which will then either fail the YAML parse or publish
mojibake.
### 3.3 Backlinks: links inside tight lists are invisible — **MED**
`build/Backlinks.hs:220-226`. `extractLinksWithContext`'s `go` handles
`Para`, `BlockQuote`, `Div`, `BulletList`, `OrderedList`, then `go _ = []`.
Tight list items (the default `- item` form) are `Plain` blocks, not
`Para`, so recursion into list children yields nothing. Every internal
link written in a tight list never produces a backlink. `Header`, `Table`,
and `DefinitionList` blocks are likewise skipped. The doc comment implies
coverage it doesn't deliver.
### 3.4 Stability "age" is the first→last commit span, not time since first commit — **MED**
`build/Stability.hs:89-93,99-112`. Docs say "age in days since first
commit," but `classify (length dates) (daySpan (last dates) newest)`
computes the span between first and most recent *commit*, with no
reference to today. A piece written in a one-week burst years ago reports
"volatile" forever; time passing without commits can never increase
stability. Either the comment or the metric is wrong.
### 3.5 Frontmatter `history:` assumed newest-first; WRITING.md documents oldest-first — **MED**
`build/Stability.hs:204-217,299-336` vs `WRITING.md:105-109`.
`loadVersionHistory` keeps authored order and all range fields treat the
head as newest (`es@(newest:_) -> let oldest = last es`). Git history is
newest-first, but WRITING.md's `history:` example is oldest-first. With
the documented ordering, `version-history-range` renders reversed
("14 March 2026 1 March 2026"), `range-start` returns the newest date,
and `version-history-primary` shows the three *oldest* entries.
### 3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — **MED**
`build/Archive.hs:269`. `Map.lookup (meUrl me) provByUrl` joins on the raw
URL; everywhere else equivalence is `normalizeUrl` (ArchiveIndex
filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a
normalization-equivalent form (`http`→`https`, trailing slash, tracking
param) silently unpublishes `/archive/<slug>/` while ArchiveIndex's
normalized filter keeps the slug active — links keep pointing at a 404.
### 3.7 Photography `buildPin` computes wrong slug/thumb/title for flat entries — **MED**
`build/Photography.hs:354,362`. `slug = takeFileName (takeDirectory fp)`
for a flat `content/photography/foo.md` this yields `"photography"`, so
map.json gets `"slug": "photography"`, the title fallback is wrong, and
`thumb = "/photography/photography/<p>"` 404s (flat-single assets route to
`/photography/<asset>`). PHOTOGRAPHY.md:214 explicitly supports flat
singles. Latent — `content/photography/` currently has only `index.md`
but breaks the first geo-tagged flat single.
### 3.8 `geo-precision` fails open: a typo'd "hidden" publishes coordinates — **MED**
`build/Photography.hs:347-349,312-320`. Only the exact string matches
(`(_, Just "hidden", _) -> return Nothing`); any other value (e.g.
`Hidden`, `hiddn`) falls into `roundCoord`, whose catch-all treats unknown
values as `city` (~10 km rounding) — publishing coordinates the author
meant to suppress. Contradicts the file's own privacy comment (lines
287-289) and the fail-closed precedent for `visibility:` in
`build/Archive.hs:77-83`.
### 3.9 Archive state is process-lifetime cached — `watch` goes stale — **LOW**
`build/ArchiveIndex.hs:123-146` + `build/Archive.hs:304`.
`activeUrls`/`rawIndex`/`rawState` are NOINLINE `unsafePerformIO` CAFs read
once per process, and `archiveRules` reads the manifest in `preprocess`.
Under `site watch`, edits to `manifest.yaml`, `removed.yaml`, or the
regenerated state JSONs are never re-read until restart. One-shot builds
unaffected.
### 3.10 Pinned pages render raw ISO in `$last-reviewed$` — **LOW**
`build/Stability.hs:166-170`. The git branch formats via `fmtIso`
("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value
verbatim ("2026-05-01") — inconsistent display formatting.
### 3.11 Empty/all-comments `manifest.yaml` halts the build — **LOW**
`build/Archive.hs:158-170`. An empty YAML stream decodes as `Null`, which
fails to parse as `[ManifestEntry]` and takes the `exitFailure` branch —
draining the manifest to zero entries is fatal rather than the empty
archive the absent-file branch supports.
### 3.12 Backlinks `normaliseUrl` misses directory-form canonical URLs — **LOW**
`build/Backlinks.hs:275-281`. Strips `.html` but not
`index.html`/trailing slash: a page routed `essays/foo/index.html` keys as
`/essays/foo/index`, but a body link authored `/essays/foo/` doesn't
match — backlink silently dropped. `build/SimilarLinks.hs:97-99` handles
exactly this case and its comment flags the divergence.
### 3.13 SimilarLinks PDF viewer URL not percent-encoded — **LOW**
`build/SimilarLinks.hs:155-164`.
`viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw`
`escapeHtml` handles HTML metachars only; a path containing `&`, `?`, `#`,
or spaces breaks the `file=` query value.
### 3.14 Photography feed thumbnails only for directory-form entries — **LOW**
`build/Photography.hs:449-453`. `imgTag` requires `isDir`; flat singles
and series children (`<series>/<photo>.md`) get text-only feed entries,
against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and
the feed's deliberate inclusion of series children.
### 3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — **LOW**
`build/Marks.hs:272-278,565`. `computeTrust _ _ = 0` with a comment
claiming the figure "collapses to the bare frame," but
`renderEpistemicFigure` unconditionally calls `renderTrustLabel`, so a
piece with `status:` but no `confidence`/`evidence` (a case MARKS.md:696
says should render) displays a prominent center "0" — indistinguishable
from an authored zero-trust score.
### 3.16 Feature-module NITs
- `build/Catalog.hs:228-235`: two distinct unknown categories render as
adjacent duplicate "Other" sections (equal rank, `groupBy` on raw
string).
- `build/Stats.hs:754-777`: `pageTOC` comment says "nine h2 sections";
lists eleven (matching the eleven rendered).
- `build/SimilarLinks.hs:51-54`: comment says "the template caps the
display"; the code caps it (`take maxSimilar` at line 80).
- `build/Stats.hs:169-171`, `build/Archive.hs:564-569`: "median" is the
upper-median for even-length lists.
- `build/Backlinks.hs:133-153`: protocol-relative `//host/path` URLs pass
`isPageLink` and pollute backlinks.json.
- `build/BibExtras.hs:75-98`: `@string`/`@comment`/`@preamble` blocks
parsed as citekey entries — only consequential on a citekey/macro-name
collision.
Verified clean: Marks tick positions/axis order/radii match MARKS.md §3;
proved-confidence trust substitution matches §4.3; Archive's fail-closed
`visibility` validation, removed.yaml conflict rejection, and double-sided
SHA-256 verification all match ARCHIVE.md.
---
## 4. Python & shell tooling
### 4.1 `data/embed-cache-pages.npz.tmp.npz` orphan: explained; cleanup + ignore gaps — **MED**
The orphan (mtime May 26) is the fossil of a fixed bug: an earlier
embed.py passed a bare path to `np.savez_compressed`, numpy appended
`.npz` (verified in numpy's `_savez` source), and the subsequent
`os.replace` raised FileNotFoundError, stranding the file. The current
file-handle code (`tools/embed.py:173-183`) is correct, but: (a) nothing
deletes the stale orphan — **delete it, don't commit it**; (b) the tmp
write has no try/finally, so any mid-write exception strands
`embed-cache-pages.npz.tmp`; (c) the new `.gitignore` entry is exact-path
(`data/embed-cache-pages.npz`) and covers neither `.tmp` nor `.tmp.npz`
variants — widen to `data/embed-cache-pages.npz*`; (d) the fixed tmp name
means two concurrent runs interleave writes.
### 4.2 Corrupt embed cache crashes instead of being discarded — **MED**
`tools/embed.py:154`. The discard path catches
`(OSError, KeyError, ValueError)`, but `np.load` on a truncated `.npz`
raises `zipfile.BadZipFile` (verified MRO: `BadZipFile → Exception`), and
`EOFError` is also uncaught. A half-written cache (exactly what §4.1(b)
can produce) makes every subsequent build print "Warning: embedding
failed" and leaves similar-links/semantic index stale until the file is
manually deleted — the opposite of the docstring's "unreadable →
discarding" contract.
### 4.3 embed.py staleness check structurally defeated by stamp-build-time — **MED**
`tools/embed.py:195-200` + `Makefile:68`. `needs_update()` compares
`_site/**/*.html` mtimes against embed's outputs — but the build order is
`embed.py``stamp-build-time.py _site`, and the stamper rewrites the
footer timestamp in essentially every HTML file each build. So every page
is always newer than embed's outputs and the "skip if fresh" fast path
never fires: the full paragraph-embedding pass (and model load) runs on
every build. The new page cache papers over half the cost; the paragraph
pass pays full price every time. Related (`tools/embed.py:297-299`):
model/config changes never invalidate outputs — currently masked by this
bug; fixing one exposes the other.
### 4.4 archive.py writes provenance/index/state non-atomically — **MED**
`tools/archive.py:718-721,734-737,953-957,1077-1080`. All plain
`write_text()`. An interrupt mid-write truncates `PROVENANCE.json`; the
next build's `json.loads` (line 642) raises an unhandled
`JSONDecodeError` — and a truncated provenance is indistinguishable from
corruption in a tool whose whole contract is integrity checking. embed.py
got atomic-write helpers; archive.py did not.
### 4.5 download-leaflet.sh: checksum verification bypassable — **MED**
`tools/download-leaflet.sh:43-47,90`. The early-exit skip checks file
existence only (download-model.sh re-verifies on its skip path), and
`curl -o "$target"` writes directly to the final path: a download that
*fails* `verify_or_warn` aborts via `set -e` *after* the bad file is in
place, and the next run's existence check accepts it permanently. A
MITM'd unpkg.com download survives one failed run and is silently
vendored on the next.
### 4.6 Other download/convert scripts leave partial files in final paths — **LOW**
`tools/download-model.sh:84`: interrupted curl leaves a partial
`model_quantized.onnx`; caught today only because model-checksums.sha256
pins all five files — any unpinned file would persist forever. Use
`-o "$dst.part" && mv`. `tools/convert-images.sh:33`: interrupted cwebp
leaves a partial `.webp` that the `-nt` staleness gate then skips forever
— a truncated WebP ships until manually deleted.
### 4.7 archive.py robustness gaps — **LOW**
- `tools/archive.py:788,795-799`: provenance missing the `artifact` key
makes `prev_artifact == slug_dir`, then `sha256_of` raises an uncaught
`IsADirectoryError` instead of the structured "prior snapshot
incomplete" error.
- `tools/archive.py:614-617,938-940,1066-1068`: non-dict manifest entries
(`- https://example.com` instead of `- url: ...`) crash with
`AttributeError: 'str' object has no attribute 'get'`.
- `tools/archive.py:896`: `wayback_save` concatenates the raw URL
(contrast `wayback_lookup` at 909, which uses `quote(url, safe="")`).
### 4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — **LOW**
`tools/add-popup-source.sh:214`: the connect-src reminder gates on
`[[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]`, but `UPSTREAM_HOST`
is only set in the `NEEDS_PROXY -eq 1` branch (lines 124-131) — the
reminder can never print, and the no-proxy case is exactly when it's
needed (the provider will be CSP-blocked with no hint). Line 71: `NAME`
from a free-text prompt is interpolated into
`location /proxy/$NAME/`/`set $upstream_$NAME` with no
`^[a-z0-9-]+$` validation (import-photo.sh validates; this doesn't).
### 4.9 refreeze.sh deletes the freeze before the replacement succeeds — **LOW**
`tools/refreeze.sh:13-16`. `rm -f "$FREEZE"` then `cabal freeze`; a failed
resolve leaves no freeze file (recoverable via git, but write-temp-then-move
is safer).
### 4.10 embed.py / atomic-write NITs — **LOW/NIT**
`tools/embed.py:109-115`: `atomic_write_bytes` uses a fixed `.tmp` name
(concurrent-run collision) and no `fsync` before `os.replace` (power loss
can leave an empty target). Same pattern in `_atomic_write_yaml` of
extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65.
`tools/embed.py:144`: NpzFile never closed — use
`with np.load(...) as npz:`.
### 4.11 Tooling NITs
- `tools/import-photo.sh:147-155`: on `mogrify -strip` failure the
EXIF-laden JPEG (GPS, serials) remains under `content/`, where
`make build`'s `git add content/` could auto-commit it. Delete `$TARGET`
on that failure path.
- `tools/hooks/pre-commit-marks.sh:28-31`: `awk '{ print $2 }'` truncates
paths with spaces; the `status:` probe reads the working tree, not the
staged blob. Advisory-only hook.
- `tools/preset-signing-passphrase.sh:30`: `echo -n "$PASSPHRASE"` eats a
passphrase starting with `-e`/`-n`/`-E`; use `printf '%s'`.
- `tools/stamp-build-time.py:52-54`: in-place non-atomic rewrite of
`_site/` HTML.
- `tools/archive.py:244`: `pdftotext` without `--`; a slug starting with
`-` parses as an option. Same in extract-exif.py:159.
- `tools/monolith-version.txt` records a sha256 (matches the binary
today, verified) but `find_monolith()` never checks it.
Verified clean: sign-site.sh (atomic sig writes, post-pass manifest
verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT
trap, hash verified before extraction); audit-marks.py, viz_theme.py,
extract-dimensions.py, extract-palette.py; embed.py's faiss `-1` padding
is safely filtered; `uv lock --check` passes; model-checksums.sha256 pins
all five model files.
---
## 5. Frontend JavaScript
### 5.1 Score-reader pages never restore theme/settings — **MED**
`templates/score-reader-default.html:10` + `static/js/theme.js:12-13`. The
template loads `theme.js` without `utils.js` (unlike head.html:66-67), so
`window.lnUtils.safeStorage` is undefined and theme/text-size/focus-mode/
reduce-motion all silently fail to restore — a dark-theme user gets a
light flash-and-stay on every score page. Compounding: settings.js (line
15; the template does render the settings toggle) falls back to its no-op
store, so theme picks made on score pages never persist either.
### 5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — **MED**
`static/js/search-filters.js:117-125`. `normUrl()` returns `u.pathname`
verbatim and looks it up in `epistemicMeta[url]`. Verified:
`_site/data/epistemic-meta.json` keys include
`/essays/beyond-comorbidity-indices/index.html` while rendered result
links use `/essays/beyond-comorbidity-indices/`. The lookup misses,
`passes(null)` returns true ("no metadata = don't filter"), so every
directory-style page bypasses all active epistemic filters. Flat `.html`
pages match fine, which hides the bug.
### 5.3 viz.js ignores the cappuccino theme — **MED**
`static/js/viz.js:94-99`. `isDark()` knows only
`'dark'`/`'light'`/OS-preference, but theme.js/settings.js support
`'cappuccino'` — a dark-brown theme (`--bg: #553a28`, base.css:203). With
OS-light + cappuccino, charts render the LIGHT config (near-black marks
and axis labels) on a dark background.
### 5.4 collapse.js localStorage keys collide across pages — **MED**
`static/js/collapse.js:44,83`. Key is
`'section-collapsed:' + heading.id` with no pathname namespace (contrast
annotations.js). Pandoc auto-slugs (`#introduction`, `#background`) recur
across essays, so collapsing "Introduction" on one essay collapses it
everywhere. Also uses raw `localStorage` rather than
`lnUtils.safeStorage`.
### 5.5 semantic-search.js: stale-response race + duplicate index fetch — **MED**
`static/js/semantic-search.js:117-144`. `runSearch` has no generation
token; overlapping queries render in promise-resolution order, so an
older query's hits can replace a newer one's (with `setStatus('')`
masking it). `loadIndex()` (42-59) has no in-flight-promise dedup (unlike
`loadModel`'s `loadModelPromise`), so concurrent first searches fetch
`semantic-index.bin` + `semantic-meta.json` twice.
### 5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — **MED**
`static/js/lightbox.js`. Overlay sets `role="dialog"` +
`aria-modal="true"` but has no Tab handling (gallery.js's `trapTab` at
235-257 shows the in-repo pattern) — focus walks into the obscured page.
Trigger images get only a `click` listener and no `tabindex`/keydown, so
keyboard users can't open it; `close()` focuses a non-focusable `<img>`,
which no-ops.
### 5.7 Frontend LOWs
- `static/js/gallery.js:122-125,270-275`: math/score overlay is
click-only (no role/tabindex/keydown); `closeOverlay()` focus-returns
to a non-focusable div — focus drops to `<body>`.
- `static/js/popups.js:478,515`: the Wikipedia provider's
`decodeURIComponent` runs synchronously before the `.catch` attaches —
a malformed percent sequence in a link path throws an uncaught
`URIError` per hover.
- `static/js/popups.js:359,390`: fetched monogram SVG injected via
`innerHTML` unescaped — the single unsanitized path in an otherwise
fully escaped pipeline. Build-authored content, so not exploitable
today; the comment acknowledges the trust assumption.
- `static/js/citations.js`: dead file — no template loads it; popups.js
supersedes it. If ever re-added it would double-bind and inject
bibliography innerHTML without popups.js's cloned-node hardening.
Delete.
- `static/js/nav.js:26,30-31`: raw `localStorage` unguarded; if storage
access throws, the throw lands before `toggle.addEventListener`,
leaving the Portals toggle completely dead (utils.js exists precisely
for this).
- `static/js/annotations.js:209-215`: marks are mouse-only; the tooltip's
Delete button is unreachable by keyboard (only recourse is the
all-or-nothing "Clear Annotations").
- `static/js/search.js:10`: unguarded `new PagefindUI(...)` — if the
pagefind bundle 404s, the ReferenceError aborts the whole handler
including the `?q=` pre-fill that the selection-popup "Here" flow
depends on.
- `static/js/semantic-search.js:55-56,96-107`: no
`vectors.length === meta.length * DIM` consistency check — a stale
CDN-cached mismatch yields NaN scores and silently garbage ranking.
(Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.)
- `static/js/transclude.js:149-151` + `collapse.js:111-114`: nested
transcludes render a bare placeholder (no rescan of injected content);
`reinitCollapse` is not idempotent (would stack toggle buttons if ever
called twice on the same container).
- `static/js/popups.js:985-988,1009-1014`: `daysBetween` uses `Math.abs`,
so future dates render "N days ago" (now.js:17 handles this correctly).
### 5.8 Frontend NITs
- `static/js/copy.js:20-22,39`: code-less `<pre>` fallback copies the
"copy" button label along with content.
- `static/js/score-reader.js:50`: URL rewritten to `?p=1` on every load
even without a `?p=` param.
- `static/js/search-filters.js:271`: `parseInt(v,10) || 0` turns junk
threshold input into an active ≥0 filter that matches everything.
- `static/js/selection-popup.js:90-95`: shift-keyup while typing capitals
in the annotation picker re-summons the selection toolbar over it.
Verified clean: the semantic-search ↔ embed.py contract post-model-split
(DIM 384, 818-entry meta, no prefix for MiniLM — the nomic
`search_document:` prefix is confined to the build-only page path); XSS
escaping across semantic-search, popups providers, map tooltips,
annotations (sole exception §5.7 monogram); theme.js ↔ settings.js
storage schema identical; all JS selector contracts against templates
(including the uncommitted head/nav edits); popups/sidenotes
double-init guards; settings.js and gallery.js focus traps.
---
## 6. Templates & content
### 6.1 Draft in undocumented location is never built — **MED**
`content/drafts/inclusionist-manifesto.md`. WRITING.md:34 says drafts go
under `content/drafts/essays/`; `draftEssayPattern`
(`build/Patterns.hs:46-49`) matches only that, so this file is invisible
even to `make watch`/`make dev` — silently orphaned.
### 6.2 SIMD/PQC essay `repository:` URL 404s — **MED**
`content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24`.
`https://git.levineuwirth.org/where-simd-helps` is missing the owner
segment — verified HTTP 404, while the sibling essay's
`.../neuwirth/beyond_comorbidity_indices` returns 200.
### 6.3 Tracked drafts contradict the gitignore policy — **MED**
`.gitignore:88` ignores `content/drafts/` as local-only "working notes,"
but `git ls-files -i -c` shows four tracked drafts
(`digital_progeny.md`, `modern_idolatry.md`, `test-essay.md`,
`university_care.md`) — ignore rules don't untrack, so edits are
auto-staged by `make build` and pushed publicly by deploy. The over-broad
`**/.env.*` pattern also matches the tracked `.env.example`.
### 6.4 Template/content LOWs and NITs
- `content/colophon.md:5`: `modified:` is dead frontmatter — nothing
reads it; `$date-modified$` (page-footer.html:108) is Hakyll's
`dateField` over the `date` key.
- Seven files end frontmatter with a valueless `confidence-history:`
(YAML null; WRITING.md:97 documents a list of ints) — harmless, but
`content/essays/scaling_outage.md` also retains the full WRITING.md
scaffold comments in a published essay.
- `static/images/canto31.jpg`: still 4.0 MB (prior-audit §6.1 unfixed).
- `templates/blog-post.html:25,34`: `id="similar-links"` appears twice in
mutually exclusive `$if$` branches — safe, fragile under edit.
- `content/drafts/essays/digital_progeny.md`: title duplicates the
published "The Specification Dilemma" — stale draft.
- Frontmatter flags `home:`/`library:`/`links:`/`search:`/`portal:` are
consumed (head.html CSS gates, default.html:6 `data-portal`) but
undocumented in WRITING.md.
Verified clean: all `$partial(...)$` includes resolve; all ~140 distinct
template variables have context providers; no missing `alt` attributes,
tag-balance failures, or within-page duplicate IDs in composed pages; all
26 CSS files referenced by head.html exist; sampled enum values across
all sections are legal per WRITING.md and Contexts.hs validation lists.
---
## 7. Documentation / spec drift (WRITING.md, README.md)
### 7.1 `js:` page-script paths documented as content-relative; emitted root-relative — **MED**
`WRITING.md:773-775` vs `templates/default.html:37`
(`<script src="/$script-src$" defer>`). The doc claims a composition's
`js: scripts/widget.js` serves at `/music/symphony/scripts/widget.js`; the
template emits raw root-relative frontmatter. The only current user
(memento-mori) works by coincidence of its root-level route. A
composition following the doc would 404.
### 7.2 "Standalone page `content/my-page/index.md`" has no generic rule — **MED**
`WRITING.md:20` presents directory-form standalone pages as a general
capability; `build/Site.hs` hardcodes only `content/me/index.md` (293) and
`content/memento-mori/index.md` (307); the generic rule (351) matches flat
`content/*.md` only. A new `content/my-page/index.md` silently doesn't
build.
### 7.3 Portal table lists 8 portals; the build has 9 — **MED**
`WRITING.md:221-231` omits Photography, which is in `homePortals`
(`build/Site.hs:50-60`), the nav, and `content/tag-meta/photography.md`.
### 7.4 Three implemented frontmatter fields undocumented — **MED**
WRITING.md:3 claims to cover "all frontmatter fields"; zero hits for:
`summary:` (`build/Contexts.hs:415-427`, rendered by essay.html:16 and
reading.html:12, in live use), `revised:` (`build/Contexts.hs:815`
`getRevisions` — drives `$date-display$`/`$date-original$`/
`$revision-note$` and list sort order), `keywords:`
(`build/Contexts.hs:283` → `/bibliography/<kw>/` links).
### 7.5 Documentation LOWs
- `WRITING.md:268-269,82`: default citation style called "Chicago
Author-Date"; the injected CSL (`build/Citations.hs:114,167-168`) is
`data/chicago-notes.csl`, titled "Chicago Notes Bibliography".
- `README.md:12,19`: `make watch` described as "rebuilds on save without
a server"; it runs Hakyll's preview server (WRITING.md:1139 has it
right).
- `WRITING.md:105-109`: `history:` example ordering contradicts the code
(see §3.5).
---
## 8. nginx, Makefile & deployment
### 8.1 Multi-line CSP value embeds literal `\` + LF bytes — **MED**
`nginx/security-headers.conf:60-71`. The
`Content-Security-Policy-Report-Only` value is a single quoted string
spanning 12 lines with trailing `\` characters — nginx has no
line-continuation inside quoted strings, so the emitted header contains
raw backslash, LF, and leading-space bytes between directives. Raw LF in
a header value is illegal in HTTP/2 (vhost example enables `http2 on`);
strict clients reject the whole response. Sent on every response even as
Report-Only. Must be collapsed to one line.
### 8.2 CSP gaps that will fire under enforcement — **MED**
`nginx/security-headers.conf:66-67`. (a) `font-src 'self' data:` blocks
KaTeX webfonts: head.html:61 loads `katex.min.css` from cdn.jsdelivr.net,
whose relative font URLs resolve to the CDN. (b) `connect-src 'self'`
blocks the onnxruntime `.wasm` that transformers.js v2 (dynamically
imported in `static/js/semantic-search.js:25`) fetches from jsdelivr —
the config comment covers the same-origin model files but not the
runtime. Both latent while Report-Only.
### 8.3 Makefile auto-commit sweeps any pre-staged changes — **MED**
`Makefile:28-29`. `git add content/` followed by
`git diff --cached --quiet || git commit -m "auto: ..."` commits the
*entire index* — anything previously staged gets folded into an
`auto: <timestamp> [skip ci]` commit and pushed publicly on deploy. Use
`git commit -- content/` or verify no foreign paths are staged.
### 8.4 Makefile LOWs
- pdf-thumbs: the `find | while read` pipeline swallows `pdftoppm`
failures (loop exit status is the last iteration's) — a corrupt PDF
silently ships without a thumbnail.
- deploy: prerequisite order `clean build sign` is guaranteed only under
serial make; no `.NOTPARALLEL:` guard for `-j` invocations. (Confirmed:
deploy does run `clean` first; `.PHONY` is complete; `.env` export
allowlist is sound.)
- `tools/hooks/pre-commit-marks.sh` is documented (Makefile:175 comment)
but not installed — `.git/hooks/` has only samples and `core.hooksPath`
is unset.
Verified clean: all seven `data/` JSON/YAML files parse;
`data/embed-cache-pages.npz` is untracked, so the new gitignore entry is
fully effective; nginx archive.conf's add_header-inheritance re-include is
correct; no redirect loops; popup-proxy rate-limit/cache zones correctly
documented for http{} scope.
---
## 9. Working-tree diff review (branding refresh + embed split)
The model contract is **intact** — the diff splits one MiniLM pipeline
into two: pages now use nomic-embed-text-v1.5 (768d, build-only, for
similar-links.json); paragraphs stay on all-MiniLM-L6-v2@c9745ed (384d,
the browser contract). download-model.sh, model-checksums.sha256,
semantic-search.js (`DIM = 384`), and both WRITING.md lines (1108 nomic
for Related-pages, 1128 MiniLM for client search) are all consistent.
Icon declarations all match real files (verified with `file`: apple-touch
180×180, favicon-96 96×96, manifest PNGs 192/512, og-image 1200×630
matching declared og:image dimensions; the webp sidecar was regenerated).
Open items beyond §1.3/§1.4/§4.1:
### 9.1 32.8 KB traced SVG inlined into every page — **MED**
`templates/partials/logo-mark.svg` (32,818 bytes, potrace-style single
giant `<path>`) is inlined via the nav partial into every HTML page —
a ~33 KB per-page weight regression (pre-compression). The two-tone
`--logo-ink`/`--logo-bg` cutout (components.css:72-98) genuinely needs
inline SVG or `<use>`; an external sprite + `<use href>` restores
cacheability. Better still: a hand-drawn or simplified path — a traced
bitmap at nav size carries detail that can never resolve.
### 9.2 Icon asset bloat — **LOW**
`static/favicon.ico` is now 71,766 bytes; parsed directory shows
16/32/48/64/128/256 px entries, the 128+256 pair alone 55.8 KB. The .ico
is only the legacy fallback (modern browsers take the SVG); 16+32+48
(~8 KB) is conventional. `static/favicon.svg` is a 32,844-byte traced
path. `static/images/link-icons/internal.svg` went ~2 KB → 32,818 bytes
yet renders at 0.71.6 rem via CSS mask in three stylesheets
(components.css:853, typography.css:833, popups.css:161).
### 9.3 Webmanifest regressions — **NIT**
`static/site.webmanifest`: `purpose` changed maskable→`any` for both
icons (Android adaptive launchers will letterbox; convention is separate
`any` + `maskable` entries); still no `start_url`/`scope`/`description`
(Lighthouse installability warnings). JSON valid; icons verified.
---
## 10. Prior audit (AUDIT.md 2026-05-07) follow-up
| Finding | Status |
|---|---|
| §1.1 freeze unsolvable | **Effectively still open** — aeson pin fixed, but the freeze broke again via `distributive` after a system update (§1.1 above); the underlying freeze-vs-system-db fragility is unaddressed |
| §1.3 Python version mismatch | Fixed (`requires-python = ">=3.14"` matches `.python-version`) |
| §1.4 model checksums | Fixed (`tools/model-checksums.sha256`, 5 entries) |
| §9.1 nginx headers | Fixed (`nginx/security-headers.conf` + vhost example, README'd) — but see §8.1/§8.2 for new issues in that file |
| §6.1 `canto31.jpg` 4 MB | **Unfixed** |
| robots.txt / sitemap | Fixed (Site.hs:941/963, present in `_site/`) |
| README `paper/`/`spec.md` ghosts | Fixed |
| rsync target quoting | Fixed |
| date-quoting doc | Fixed (WRITING.md:106) |
| tag-meta no-title exception | Fixed (WRITING.md:238-251) |
---
## Suggested triage order
1. ~~`tools/refreeze.sh`~~ (§1.1 — in progress)
2. Delete `data/embed-cache-pages.npz.tmp.npz`; widen the gitignore
pattern; `git add` `logo-mark.svg` + `og-image.png` before committing
the branding diff (§1.4, §4.1)
3. Guard `ArchiveIndex.hs` file reads with `doesFileExist` (§1.2)
4. Pin or sandbox the nomic remote code (§1.3)
5. Fix the `/fiction/``/poetry/` 404s (§2.1) and the production-visible
frontend MEDs (§5.1, §5.2)
6. Collapse the nginx CSP to one line before ever flipping it to
enforcing (§8.1, §8.2)
7. The rest by severity as time allows

View File

@ -1,5 +1,10 @@
.PHONY: build deploy sign download-model download-pdfjs download-leaflet compress-assets convert-images pdf-thumbs pdfs watch clean dev audit-marks archive-gc archive-wayback archive-check .PHONY: build deploy sign download-model download-pdfjs download-leaflet compress-assets convert-images pdf-thumbs pdfs watch clean dev audit-marks archive-gc archive-wayback archive-check
# deploy's prerequisite order (clean -> build -> sign) is only correct
# serially; under `make -j` they could interleave. This build has no
# intra-target parallelism worth preserving, so disable it outright.
.NOTPARALLEL:
# Source .env for deploy / GitHub config if it exists. # Source .env for deploy / GitHub config if it exists.
# .env format: KEY=value (one per line, no `export` prefix, no quotes needed). # .env format: KEY=value (one per line, no `export` prefix, no quotes needed).
# Only the variables explicitly listed below are exported to recipe # Only the variables explicitly listed below are exported to recipe
@ -21,8 +26,12 @@ build:
# so a stray secret dropped under content/ is NOT auto-staged. To # so a stray secret dropped under content/ is NOT auto-staged. To
# intentionally commit a normally-ignored file, use `git add -f` # intentionally commit a normally-ignored file, use `git add -f`
# manually before running `make build`. # manually before running `make build`.
#
# The commit and its guard are pathspec-limited to content/ so that
# anything the user had previously staged for other reasons is left
# staged, not silently swept into the auto-commit.
@git add content/ @git add content/
@git diff --cached --quiet || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ) [skip ci]" @git diff --cached --quiet -- content/ || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ) [skip ci]" -- content/
@mkdir -p data @mkdir -p data
@date +%s > data/build-start.txt @date +%s > data/build-start.txt
@./tools/convert-images.sh @./tools/convert-images.sh
@ -110,12 +119,16 @@ convert-images:
# Thumbnails are written as static/papers/foo.thumb.png alongside each PDF. # Thumbnails are written as static/papers/foo.thumb.png alongside each PDF.
# Skipped silently when pdftoppm is not installed or static/papers/ is empty. # Skipped silently when pdftoppm is not installed or static/papers/ is empty.
pdf-thumbs: pdf-thumbs:
# A failing pdftoppm must at least warn: the `find | while` pipeline's
# exit status is the last iteration's, so without the `||` a corrupt
# PDF would silently ship without a thumbnail.
@if command -v pdftoppm >/dev/null 2>&1; then \ @if command -v pdftoppm >/dev/null 2>&1; then \
find static/papers -name '*.pdf' 2>/dev/null | while read pdf; do \ find static/papers -name '*.pdf' 2>/dev/null | while read pdf; do \
thumb="$${pdf%.pdf}.thumb"; \ thumb="$${pdf%.pdf}.thumb"; \
if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then \ if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then \
echo " pdf-thumb $$pdf"; \ echo " pdf-thumb $$pdf"; \
pdftoppm -r 100 -f 1 -l 1 -png -singlefile "$$pdf" "$$thumb"; \ pdftoppm -r 100 -f 1 -l 1 -png -singlefile "$$pdf" "$$thumb" \
|| echo "Warning: pdf-thumb failed for $$pdf (page ships without a thumbnail)" >&2; \
fi; \ fi; \
done; \ done; \
else \ else \

View File

@ -9,14 +9,15 @@ with a custom build system in `build/` and a Haskell + JS + Python toolchain.
```sh ```sh
make build # one-shot production build into _site/ make build # one-shot production build into _site/
make dev # dev build (drafts visible) + local server on :8000 make dev # dev build (drafts visible) + local server on :8000
make watch # cabal-watch rebuild (drafts visible) make watch # Hakyll live-reload dev server (drafts visible)
make clean # cabal run site -- clean make clean # cabal run site -- clean
make deploy # clean → build → sign → push → rsync to VPS make deploy # clean → build → sign → push → rsync to VPS
``` ```
`make build` always runs `make clean` implicitly when invoked from `make deploy`. `make build` always runs `make clean` implicitly when invoked from `make deploy`.
For day-to-day work, prefer `make dev` (which serves the site on For day-to-day work, prefer `make dev` (which serves the site on
`http://localhost:8000`) or `make watch` (rebuilds on save without a server). `http://localhost:8000`) or `make watch` (Hakyll's live-reload preview server,
which rebuilds on save and serves the site locally).
**Run `make build` any time you add or replace binary assets** (JPEG/PNG **Run `make build` any time you add or replace binary assets** (JPEG/PNG
figures, PDFs, music assets). `make dev` and `make watch` skip the figures, PDFs, music assets). `make dev` and `make watch` skip the

View File

@ -17,15 +17,22 @@ frontmatter fields, and every authoring feature available in the Markdown source
| Fiction | `content/fiction/my-story.md` | `/fiction/my-story.html` | | Fiction | `content/fiction/my-story.md` | `/fiction/my-story.html` |
| Composition | `content/music/{slug}/index.md` | `/music/{slug}/` | | Composition | `content/music/{slug}/index.md` | `/music/{slug}/` |
| Standalone page | `content/my-page.md` | `/my-page.html` | | Standalone page | `content/my-page.md` | `/my-page.html` |
| Standalone page (with co-located assets) | `content/my-page/index.md` | `/my-page.html` | | Standalone page (with co-located assets; needs a dedicated rule) | `content/me/index.md` | `/me.html` |
| Draft essay | `content/drafts/essays/my-draft.md` | `/drafts/essays/my-draft.html` (dev only) | | Draft essay | `content/drafts/essays/my-draft.md` | `/drafts/essays/my-draft.html` (dev only) |
File names become URL slugs. Use lowercase, hyphen-separated words. File names become URL slugs. Use lowercase, hyphen-separated words.
If a standalone page embeds co-located SVG score fragments or other relative assets, Flat `content/<page>.md` is the generic standalone form — any flat file dropped
place it in its own directory (`content/my-page/index.md`) rather than as a flat file. into `content/` builds automatically. Directory-form standalone pages
Score fragment paths are resolved relative to the source file's directory; a flat (`content/my-page/index.md`) are **not** picked up by the generic rule; each one
`content/my-page.md` would resolve them from `content/`, which is wrong. requires its own dedicated `match` rule in `build/Site.hs`. The two existing
ones are `content/me/index.md` and `content/memento-mori/index.md` — follow
their pattern when adding another.
The directory form exists for pages that embed co-located SVG score fragments
or other relative assets: score fragment paths are resolved relative to the
source file's directory, and a flat `content/my-page.md` would resolve them
from `content/`, which is wrong.
--- ---
@ -65,9 +72,12 @@ subtitle: "An Optional Secondary Line" # optional; rendered below the title in
date: 2026-03-15 # required; used for ordering, feed, and display date: 2026-03-15 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews abstract: > # optional; shown in the metadata block and link previews
A one-paragraph description of the piece. A one-paragraph description of the piece.
summary: | # optional; rendered in a "Summary" box near the abstract
A structured summary. **Markdown allowed** — bold, lists, multiple paragraphs.
tags: # optional; see Tags section tags: # optional; see Tags section
- nonfiction - nonfiction
- nonfiction/philosophy - nonfiction/philosophy
keywords: [lattices, simd] # optional; links to /bibliography/<keyword>/ pages (list or comma-separated string)
authors: # optional; overrides the default "Levi Neuwirth" link authors: # optional; overrides the default "Levi Neuwirth" link
- "Levi Neuwirth | /me.html" - "Levi Neuwirth | /me.html"
- "Collaborator | https://their.site" - "Collaborator | https://their.site"
@ -79,7 +89,7 @@ further-reading: # optional; see Citations section
- someKey - someKey
- anotherKey - anotherKey
bibliography: data/custom.bib # optional; overrides data/bibliography.bib bibliography: data/custom.bib # optional; overrides data/bibliography.bib
csl: data/custom.csl # optional; overrides Chicago Author-Date csl: data/custom.csl # optional; overrides Chicago Notes Bibliography
no-collapse: true # optional; disables collapsible h2/h3 sections no-collapse: true # optional; disables collapsible h2/h3 sections
repository: https://git.levineuwirth.org/levi/repo # optional; "Repository" link in metadata repository: https://git.levineuwirth.org/levi/repo # optional; "Repository" link in metadata
preprint: /papers/my-essay.pdf # optional; "Preprint" link in metadata (typeset PDF version) preprint: /papers/my-essay.pdf # optional; "Preprint" link in metadata (typeset PDF version)
@ -101,12 +111,20 @@ confidence-history: # list of integers; trend arrow derived from last two
peer-status: under-review # optional; unreviewed (default) | under-review | peer-reviewed | published | retracted peer-status: under-review # optional; unreviewed (default) | under-review | peer-reviewed | published | retracted
result-shape: mixed # optional; positive | negative | mixed | comparative | descriptive result-shape: mixed # optional; positive | negative | mixed | comparative | descriptive
# Version history — optional; falls back to git log, then to date frontmatter # Version history — optional; falls back to git log, then to date frontmatter.
# Entries may be listed in any order — they are sorted by date at build time.
history: history:
- date: 2026-03-01 # ISO date; unquoted is fine (the Haskell YAML parser keeps it as a string) - date: 2026-03-14 # ISO date; unquoted is fine (the Haskell YAML parser keeps it as a string)
note: Initial draft
- date: 2026-03-14
note: Expanded typography section; added citations note: Expanded typography section; added citations
- date: 2026-03-01
note: Initial draft
# Revision log — optional; drives the date shown on cards and list pages
# (see Revision dates section)
revised:
- date: "2026-04-10"
note: "expanded the section on typography"
- date: "2026-03-20" # note is optional per-entry
--- ---
``` ```
@ -226,6 +244,7 @@ The top-level segment maps to a **portal** in the nav:
| Miscellany | `/miscellany/` | | Miscellany | `/miscellany/` |
| Music | `/music/` | | Music | `/music/` |
| Nonfiction | `/nonfiction/` | | Nonfiction | `/nonfiction/` |
| Photography | `/photography/` |
| Poetry | `/poetry/` | | Poetry | `/poetry/` |
| Research | `/research/` | | Research | `/research/` |
| Tech | `/tech/` | | Tech | `/tech/` |
@ -265,7 +284,8 @@ The URL part is optional.
## Citations ## Citations
The citation pipeline uses Chicago Author-Date style. The bibliography lives at The citation pipeline uses Chicago Notes Bibliography style
(`data/chicago-notes.csl`). The bibliography lives at
`data/bibliography.bib` (BibLaTeX format) by default; override per-page with `data/bibliography.bib` (BibLaTeX format) by default; override per-page with
`bibliography` and `csl`. `bibliography` and `csl`.
@ -278,7 +298,7 @@ Multiple sources agree.[@jones2019; @brown2021]
``` ```
Inline citations render as numbered superscripts `[1]`, `[2]`, etc. The Inline citations render as numbered superscripts `[1]`, `[2]`, etc. The
bibliography section appears automatically in the page footer. `citations.js` bibliography section appears automatically in the page footer. `popups.js`
adds hover previews showing the full reference. adds hover previews showing the full reference.
### Further reading ### Further reading
@ -754,9 +774,8 @@ at the top of the catalog.
## Page scripts ## Page scripts
For pages that need custom JavaScript (interactive widgets, visualisations, etc.), For pages that need custom JavaScript (interactive widgets, visualisations, etc.),
place the JS file alongside the content and reference it via the `js:` frontmatter reference the JS file via the `js:` frontmatter key. The file is injected as a
key. The file is copied to `_site/` and injected as a deferred `<script>` at the deferred `<script>` at the bottom of `<body>`.
bottom of `<body>`.
```yaml ```yaml
js: scripts/memento-mori.js # single file js: scripts/memento-mori.js # single file
@ -770,12 +789,18 @@ js:
- scripts/widget-b.js - scripts/widget-b.js
``` ```
Paths are relative to the content file. A composition at Paths are **site-root-relative**, not relative to the content file: the template
`content/music/symphony/index.md` with `js: scripts/widget.js` serves the emits the value verbatim with a leading `/` prepended. Write the path without a
script at `/music/symphony/scripts/widget.js`. leading slash. `js: scripts/widget.js` loads `/scripts/widget.js` regardless of
where the page lives — a composition at `content/music/symphony/index.md` with
that value does *not* get `/music/symphony/scripts/widget.js`.
No changes to the build system are needed — the `content/**/*.js` glob rule The script file must live where the build serves that URL. The `content/**/*.js`
copies all JS files from `content/` to `_site/` automatically. glob rule copies JS files to `_site/` with the `content/` prefix stripped, so
`content/scripts/widget.js` is served at `/scripts/widget.js` — this is the
current convention (the memento-mori page keeps its script at
`content/scripts/memento-mori.js` and references it as
`js: scripts/memento-mori.js`).
--- ---
@ -896,7 +921,8 @@ should copy and adapt it; the file documents the §2.2 visual contract
The version history footer section uses a three-tier fallback: The version history footer section uses a three-tier fallback:
1. **`history:` frontmatter** — your authored notes, shown exactly as written. 1. **`history:` frontmatter** — your authored notes. Entries may be listed in
any order — they are sorted by date at build time.
2. **Git log** — if no `history:` key, dates are extracted from `git log --follow`. 2. **Git log** — if no `history:` key, dates are extracted from `git log --follow`.
Entries have no message (date only). Entries have no message (date only).
3. **`date:` frontmatter** — if git has no commits for the file, falls back to 3. **`date:` frontmatter** — if git has no commits for the file, falls back to
@ -910,14 +936,50 @@ descriptive:
```yaml ```yaml
history: history:
- date: 2026-03-01
note: Initial draft
- date: 2026-03-14 - date: 2026-03-14
note: Expanded section 3; incorporated feedback from peer review note: Expanded section 3; incorporated feedback from peer review
- date: 2026-03-01
note: Initial draft
``` ```
--- ---
## Revision dates
The `revised:` key records substantive revisions and drives the date shown on
item cards and list pages. Two accepted shapes:
```yaml
revised: "2026-04-10" # scalar shorthand — one revision, no note
revised: # canonical list of objects
- date: "2026-04-10"
note: "expanded the section on Shestov"
- date: "2025-12-03" # note is optional per-entry
```
Dates are ISO `YYYY-MM-DD` strings. Entries may be listed in any order — they
are sorted by date at build time, most recent first. Entries missing `date:`
or carrying non-string values are silently dropped; the build never fails on
a malformed `revised:` block.
Effects:
- **`$date-display$` / `$date-iso$`** — cards and list pages show the
most-recent revision date instead of the creation date.
- **Sort order** — revision-aware lists (`/new.html`, tag pages, the library)
sort by the display date, so a freshly revised piece moves to the top.
- **`$date-original$`** — when the latest revision date differs from the
creation date, the card adds a "revised from …" annotation showing the
original date.
- **`$revision-note$`** — the note on the most-recent entry renders as an
italicized line under the abstract on the card.
`revised:` is independent of `history:` (the version-history footer above);
add a matching `history:` entry if the revision should appear there too.
---
## Typography features ## Typography features
Applied automatically at build time; no markup needed. Applied automatically at build time; no markup needed.
@ -1125,7 +1187,7 @@ These pages are built automatically and require no content files or markup:
| Author indexes | `/authors/<slug>/` | All content attributed to an author | | Author indexes | `/authors/<slug>/` | All content attributed to an author |
| Random manifest | `/random-pages.json` | JSON array of page URLs for the random-page button | | Random manifest | `/random-pages.json` | JSON array of page URLs for the random-page button |
| Atom feeds | `/feed.xml`, `/music/feed.xml` | All content feed + music-only feed | | Atom feeds | `/feed.xml`, `/music/feed.xml` | All content feed + music-only feed |
| Search | `/search.html` | Pagefind full-text search + client-side semantic search (`nomic-embed-text-v1.5` ONNX model) | | Search | `/search.html` | Pagefind full-text search + client-side semantic search (`all-MiniLM-L6-v2` ONNX model) |
--- ---

View File

@ -163,11 +163,19 @@ readManifest = do
else do else do
parsed <- Y.decodeFileEither manifestPath parsed <- Y.decodeFileEither manifestPath
case parsed of case parsed of
Right es -> return es -- An empty or all-comments file decodes as YAML @Null@,
Left e -> do -- not as a list. That is the legitimate "drained to zero
hPutStrLn stderr $ -- entries" state, not a broken file — treat it as the
"[archive] FATAL: manifest.yaml: " ++ show e -- empty manifest the absent-file branch already supports.
exitFailure Right A.Null -> return []
Right v -> case A.fromJSON v of
A.Success es -> return es
A.Error msg -> fatal msg
Left e -> fatal (show e)
where
fatal msg = do
hPutStrLn stderr $ "[archive] FATAL: manifest.yaml: " ++ msg
exitFailure
readRemovedUrls :: IO (Set.Set T.Text) readRemovedUrls :: IO (Set.Set T.Text)
readRemovedUrls = do readRemovedUrls = do
@ -265,8 +273,17 @@ loadArchiveEntries = do
removed <- readRemovedUrls removed <- readRemovedUrls
validateManifestEntries manifest removed validateManifestEntries manifest removed
provByUrl <- readProvenances provByUrl <- readProvenances
-- Join on normalised URLs, like every other URL comparison in the
-- archive system: editing a manifest URL to a normalisation-
-- equivalent form (http->https, trailing slash, tracking params)
-- must keep matching its provenance — an exact-string join would
-- silently unpublish the page while ArchiveIndex's normalised
-- filter keeps links pointing at it. Key collisions can't occur:
-- validateManifestEntries rejects normalised duplicates.
let normKey = T.unpack . normalizeUrl . T.pack
provByNorm = Map.mapKeys normKey provByUrl
fmap catMaybes $ forM manifest $ \me -> fmap catMaybes $ forM manifest $ \me ->
case Map.lookup (meUrl me) provByUrl of case Map.lookup (normKey (meUrl me)) provByNorm of
Nothing -> return Nothing Nothing -> return Nothing
Just (slug, pv) -> do Just (slug, pv) -> do
let dir = "archive/" ++ slug let dir = "archive/" ++ slug
@ -299,6 +316,12 @@ loadArchiveEntries = do
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | All archive rules. Called once from 'Site.rules'. -- | All archive rules. Called once from 'Site.rules'.
--
-- The manifest is read here in 'preprocess' (and 'ArchiveIndex' reads
-- its sidecars in once-per-process CAFs), so archive state is fixed at
-- rule-generation time: under @site watch@, edits to @manifest.yaml@,
-- @removed.yaml@, or the regenerated state JSONs are not picked up
-- until the process restarts. One-shot builds are unaffected.
archiveRules :: Rules () archiveRules :: Rules ()
archiveRules = do archiveRules = do
entries <- preprocess loadArchiveEntries entries <- preprocess loadArchiveEntries
@ -562,10 +585,17 @@ tallyOf xs = intercalate " \183 "
| (k, c) <- Map.toList (Map.fromListWith (+) [ (x, 1 :: Int) | x <- xs ]) ] | (k, c) <- Map.toList (Map.fromListWith (+) [ (x, 1 :: Int) | x <- xs ]) ]
-- | The median of a list of ages, as @"N days"@; an em dash when empty. -- | The median of a list of ages, as @"N days"@; an em dash when empty.
-- An even-length list takes the mean of the two middle elements,
-- rounded to the nearest whole day.
medianAge :: [Int] -> String medianAge :: [Int] -> String
medianAge [] = "\8212" medianAge [] = "\8212"
medianAge xs = medianAge xs =
let m = sort xs !! (length xs `div` 2) let sorted = sort xs
n = length sorted
upper = sorted !! (n `div` 2)
lower = sorted !! (n `div` 2 - 1) -- forced only when n is even
m | odd n = upper
| otherwise = (lower + upper + 1) `div` 2
in show m ++ if m == 1 then " day" else " days" in show m ++ if m == 1 then " day" else " days"
-- | Parse a @YYYY-MM-DD@ date; 'Nothing' on malformed input. -- | Parse a @YYYY-MM-DD@ date; 'Nothing' on malformed input.

View File

@ -15,11 +15,18 @@
-- * @Archive@ — surfaces each entry's rot status on its page, the -- * @Archive@ — surfaces each entry's rot status on its page, the
-- @/archive/@ index, and the @/build/@ telemetry. -- @/archive/@ index, and the @/build/@ telemetry.
-- --
-- Both files are loaded once per build via @unsafePerformIO@ CAFs. An -- Both files are loaded once per *process* via NOINLINE
-- absent or malformed file degrades safely: an empty index makes the -- @unsafePerformIO@ CAFs (as are the manifest/removed URL sets below).
-- An absent or malformed file degrades safely: an empty index makes the
-- link consumers no-op; an absent state file makes every entry @Live@ -- link consumers no-op; an absent state file makes every entry @Live@
-- (the safe default — no link flip). @archive.py check@ is decoupled -- (the safe default — no link flip). @archive.py check@ is decoupled
-- from @make build@; a build consumes whatever state file exists. -- from @make build@; a build consumes whatever state file exists.
--
-- Consequence of the once-per-process read (shared with the manifest
-- read in 'Archive.archiveRules'): under @site watch@, edits to
-- @manifest.yaml@, @removed.yaml@, or the regenerated state JSONs are
-- not re-read — the server renders stale archive state until restart.
-- One-shot builds (@make build@ / @make deploy@) are unaffected.
module ArchiveIndex module ArchiveIndex
( ArchiveStatus (..) ( ArchiveStatus (..)
, statusName , statusName
@ -132,18 +139,26 @@ activeUrls = unsafePerformIO $ do
{-# NOINLINE rawIndex #-} {-# NOINLINE rawIndex #-}
rawIndex :: Map Text IdxEntry rawIndex :: Map Text IdxEntry
rawIndex = unsafePerformIO $ do rawIndex = unsafePerformIO $ do
decoded <- A.eitherDecodeFileStrict' indexPath exists <- doesFileExist indexPath
let parsed = either (const Map.empty) id decoded if not exists
return $ Map.filterWithKey then return Map.empty
(\canon _ -> normalizeUrl canon `Set.member` activeUrls) else do
parsed decoded <- A.eitherDecodeFileStrict' indexPath
let parsed = either (const Map.empty) id decoded
return $ Map.filterWithKey
(\canon _ -> normalizeUrl canon `Set.member` activeUrls)
parsed
-- | @url -> status@. Absent/malformed file -> empty (every entry 'Live'). -- | @url -> status@. Absent/malformed file -> empty (every entry 'Live').
{-# NOINLINE rawState #-} {-# NOINLINE rawState #-}
rawState :: Map Text ArchiveStatus rawState :: Map Text ArchiveStatus
rawState = unsafePerformIO $ do rawState = unsafePerformIO $ do
decoded <- A.eitherDecodeFileStrict' statePath exists <- doesFileExist statePath
return $ either (const Map.empty) (Map.map seStatus) decoded if not exists
then return Map.empty
else do
decoded <- A.eitherDecodeFileStrict' statePath
return $ either (const Map.empty) (Map.map seStatus) decoded
-- | @normalised-url -> slug@: the canonical key and every alias from -- | @normalised-url -> slug@: the canonical key and every alias from
-- @archive-index.json@, each fed through 'normalizeUrl'. Both keys and -- @archive-index.json@, each fed through 'normalizeUrl'. Both keys and

View File

@ -138,6 +138,8 @@ isPageLink u
| otherwise = | otherwise =
not (T.isPrefixOf "http://" u) && not (T.isPrefixOf "http://" u) &&
not (T.isPrefixOf "https://" u) && not (T.isPrefixOf "https://" u) &&
-- protocol-relative //host/path is external, not a page path
not (T.isPrefixOf "//" u) &&
not (T.isPrefixOf "#" u) && not (T.isPrefixOf "#" u) &&
not (T.isPrefixOf "mailto:" u) && not (T.isPrefixOf "mailto:" u) &&
not (T.isPrefixOf "tel:" u) && not (T.isPrefixOf "tel:" u) &&
@ -213,18 +215,28 @@ splitSentences = go []
-- For every internal link in a paragraph, emit an entry carrying the HTML -- For every internal link in a paragraph, emit an entry carrying the HTML
-- of the sentence containing the link (default display) and the HTML of -- of the sentence containing the link (default display) and the HTML of
-- the full paragraph (hover/popup context). -- the full paragraph (hover/popup context).
-- Recurses into Div, BlockQuote, BulletList, and OrderedList. -- Recurses into Div, BlockQuote, BulletList, OrderedList, and
-- DefinitionList. @Plain@ matters as much as @Para@: Pandoc renders
-- tight list items (the default @- item@ Markdown form) as @Plain@
-- blocks, so without it every link written in a tight list would be
-- invisible to the backlinks system.
extractLinksWithContext :: Pandoc -> [LinkEntry] extractLinksWithContext :: Pandoc -> [LinkEntry]
extractLinksWithContext (Pandoc _ blocks) = concatMap go blocks extractLinksWithContext (Pandoc _ blocks) = concatMap go blocks
where where
go :: Block -> [LinkEntry] go :: Block -> [LinkEntry]
go (Para inlines) = paraEntries inlines go (Para inlines) = paraEntries inlines
go (Plain inlines) = paraEntries inlines
go (BlockQuote bs) = concatMap go bs go (BlockQuote bs) = concatMap go bs
go (Div _ bs) = concatMap go bs go (Div _ bs) = concatMap go bs
go (BulletList items) = concatMap (concatMap go) items go (BulletList items) = concatMap (concatMap go) items
go (OrderedList _ items) = concatMap (concatMap go) items go (OrderedList _ items) = concatMap (concatMap go) items
go (DefinitionList defs) = concatMap defEntries defs
go _ = [] go _ = []
defEntries :: ([Inline], [[Block]]) -> [LinkEntry]
defEntries (term, bodies) =
paraEntries term ++ concatMap (concatMap go) bodies
paraEntries :: [Inline] -> [LinkEntry] paraEntries :: [Inline] -> [LinkEntry]
paraEntries inlines = paraEntries inlines =
let paraHtml = renderInlines inlines let paraHtml = renderInlines inlines
@ -268,17 +280,25 @@ linksCompiler = do
-- URL normalisation -- URL normalisation
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | Normalise an internal URL as a map key: strip query string, fragment, -- | Normalise an internal URL as a map key: strip query string and
-- and trailing @.html@; ensure a leading slash; percent-decode the path -- fragment; ensure a leading slash; strip a trailing @index.html@
-- so that @\/essays\/caf%C3%A9@ and @\/essays\/café@ collide on the same -- (keeping the directory slash) before the bare @.html@ extension, so a
-- key. -- page routed @essays\/foo\/index.html@ and a body link authored in the
-- canonical directory form @\/essays\/foo\/@ collide on the same key
-- (mirrors 'SimilarLinks.normaliseUrl'); percent-decode the path so that
-- @\/essays\/caf%C3%A9@ and @\/essays\/café@ collide on the same key.
--
-- Both sides of the backlink join go through this function: page keys
-- via 'backlinksFieldWith' (@normaliseUrl ("/" ++ route)@) and link
-- targets via 'targetKey' — so the two always agree.
normaliseUrl :: String -> String normaliseUrl :: String -> String
normaliseUrl url = normaliseUrl url =
let t = T.pack url let t = T.pack url
t1 = fst (T.breakOn "?" (fst (T.breakOn "#" t))) t1 = fst (T.breakOn "?" (fst (T.breakOn "#" t)))
t2 = if T.isPrefixOf "/" t1 then t1 else "/" `T.append` t1 t2 = if T.isPrefixOf "/" t1 then t1 else "/" `T.append` t1
t3 = fromMaybe t2 (T.stripSuffix ".html" t2) t3 = fromMaybe t2 (T.stripSuffix "index.html" t2)
in percentDecode (T.unpack t3) t4 = fromMaybe t3 (T.stripSuffix ".html" t3)
in percentDecode (T.unpack t4)
-- | Decode percent-escapes (@%XX@) into raw bytes, then re-interpret the -- | Decode percent-escapes (@%XX@) into raw bytes, then re-interpret the
-- resulting bytestring as UTF-8. Invalid escapes are passed through -- resulting bytestring as UTF-8. Invalid escapes are passed through

View File

@ -72,6 +72,8 @@ parseBibExtras path = Map.fromList . parseBib <$> readFile' path
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | Enumerate all entries in a .bib file as (citekey, extra) pairs. -- | Enumerate all entries in a .bib file as (citekey, extra) pairs.
-- @\@string@ \/ @\@comment@ \/ @\@preamble@ blocks (case-insensitive)
-- carry no citekey and are skipped wholesale.
parseBib :: String -> [(String, BibExtra)] parseBib :: String -> [(String, BibExtra)]
parseBib input = go (dropTo '@' input) parseBib input = go (dropTo '@' input)
where where
@ -81,19 +83,26 @@ parseBib input = go (dropTo '@' input)
go [] = [] go [] = []
go ('@':rest) = go ('@':rest) =
let -- Entry type, then '{', then citekey, then ',', then fields, then '}'. let -- Entry type, then '{', then citekey, then ',', then fields, then '}'.
r1 = dropWhile isAlphaNum rest -- skip type name (typeName, r1) = span isAlphaNum rest
r2 = dropWhile isSpace r1 r2 = dropWhile isSpace r1
in case r2 of in case r2 of
'{':r3 -> '{':r3
let (citekey, r4) = span (\c -> c /= ',' && not (isSpace c)) r3 -- Not citekey entries: a @string macro name (or the body
r5 = dropWhile (\c -> c /= ',' && c /= '}') r4 -- of a @comment/@preamble) must never be parsed as a
in case r5 of -- citekey. Skip the balanced brace group and carry on.
',':r6 -> | map toLower typeName `elem` ["string", "comment", "preamble"] ->
let (flds, r7) = parseFields r6 let (_, r4) = readBraces 1 "" r3
in (trim citekey, toExtra flds) : go (dropTo '@' r7) in go (dropTo '@' r4)
-- Fieldless entries: walk past and carry on. | otherwise ->
'}':r6 -> (trim citekey, emptyBibExtra) : go (dropTo '@' r6) let (citekey, r4) = span (\c -> c /= ',' && not (isSpace c)) r3
_ -> [] r5 = dropWhile (\c -> c /= ',' && c /= '}') r4
in case r5 of
',':r6 ->
let (flds, r7) = parseFields r6
in (trim citekey, toExtra flds) : go (dropTo '@' r7)
-- Fieldless entries: walk past and carry on.
'}':r6 -> (trim citekey, emptyBibExtra) : go (dropTo '@' r6)
_ -> []
_ -> go (dropTo '@' r2) _ -> go (dropTo '@' r2)
go (_:rest) = go (dropTo '@' rest) go (_:rest) = go (dropTo '@' rest)

View File

@ -99,7 +99,12 @@ parseCatalogEntry item = do
year = parseYear meta year = parseYear meta
dur = lookupString "duration" meta dur = lookupString "duration" meta
instr = lookupString "instrumentation" meta instr = lookupString "instrumentation" meta
cat = fromMaybe "other" (lookupString "category" meta) -- Fold unknown categories into the canonical "other"
-- bucket here: two distinct unknown values share a rank
-- but would groupBy into separate groups, rendering as
-- adjacent duplicate "Other" sections.
rawCat = fromMaybe "other" (lookupString "category" meta)
cat = if rawCat `elem` categoryOrder then rawCat else "other"
return $ Just CatalogEntry return $ Just CatalogEntry
{ ceTitle = title { ceTitle = title
, ceUrl = url , ceUrl = url

View File

@ -9,7 +9,8 @@ module Commonplace
import Data.Aeson (FromJSON (..), withObject, (.:), (.:?), (.!=)) import Data.Aeson (FromJSON (..), withObject, (.:), (.:?), (.!=))
import Data.List (nub, sortBy) import Data.List (nub, sortBy)
import Data.Ord (comparing, Down (..)) import Data.Ord (comparing, Down (..))
import qualified Data.ByteString.Char8 as BS import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.Yaml as Y import qualified Data.Yaml as Y
import Hakyll hiding (escapeHtml, renderTags) import Hakyll hiding (escapeHtml, renderTags)
import Contexts (siteCtx) import Contexts (siteCtx)
@ -140,7 +141,10 @@ loadCommonplace :: Compiler [CPEntry]
loadCommonplace = do loadCommonplace = do
rawItem <- load (fromFilePath "data/commonplace.yaml") :: Compiler (Item String) rawItem <- load (fromFilePath "data/commonplace.yaml") :: Compiler (Item String)
let raw = itemBody rawItem let raw = itemBody rawItem
case Y.decodeEither' (BS.pack raw) of -- encodeUtf8, not Char8.pack: Char8 truncates each Char to 8 bits,
-- silently corrupting any codepoint above 0x7F (same hazard Now.hs
-- documents — em-dash 0x2014 would become control char 0x14).
case Y.decodeEither' (TE.encodeUtf8 (T.pack raw)) of
Left err -> fail ("commonplace.yaml: " ++ show err) Left err -> fail ("commonplace.yaml: " ++ show err)
Right entries -> return entries Right entries -> return entries

View File

@ -22,6 +22,7 @@ module Contexts
, recentFirstByDisplay , recentFirstByDisplay
, Revision (..) , Revision (..)
, getRevisions , getRevisions
, isProvedConfidence
) where ) where
import Data.Aeson (Value (..)) import Data.Aeson (Value (..))
@ -86,7 +87,12 @@ affiliationField = listFieldWith "affiliation-links" ctx $ \item -> do
let entries = case lookupStringList "affiliation" meta of let entries = case lookupStringList "affiliation" meta of
Just xs -> xs Just xs -> xs
Nothing -> maybe [] (:[]) (lookupString "affiliation" meta) Nothing -> maybe [] (:[]) (lookupString "affiliation" meta)
return $ map (Item (fromFilePath "") . parseEntry) entries -- noResult, not an empty list: Hakyll's $if$ treats an empty
-- ListField as truthy, so returning [] would render the wrapper
-- markup (an empty .meta-affiliation row) on every page.
if null entries
then noResult "no affiliation"
else return $ map (Item (fromFilePath "") . parseEntry) entries
where where
ctx = field "affiliation-name" (return . fst . itemBody) ctx = field "affiliation-name" (return . fst . itemBody)
<> field "affiliation-url" (\i -> let u = snd (itemBody i) <> field "affiliation-url" (\i -> let u = snd (itemBody i)
@ -170,10 +176,17 @@ pageScriptsField = listFieldWith "page-scripts" ctx $ \item -> do
-- | List context field exposing an item's own (non-expanded) tags as -- | List context field exposing an item's own (non-expanded) tags as
-- @tag-name@ / @tag-url@ objects. -- @tag-name@ / @tag-url@ objects.
-- --
-- Fails with 'noResult' when the item has no tags — same discipline
-- as the @Excluding@ variants below — so @$if(...)$@ gates are false
-- and templates don't emit empty tag-wrapper markup.
--
-- $for(essay-tags)$<a href="$tag-url$">$tag-name$</a>$endfor$ -- $for(essay-tags)$<a href="$tag-url$">$tag-name$</a>$endfor$
tagLinksField :: String -> Context a tagLinksField :: String -> Context a
tagLinksField fieldName = listFieldWith fieldName ctx $ \item -> tagLinksField fieldName = listFieldWith fieldName ctx $ \item -> do
map toItem <$> getTags (itemIdentifier item) ts <- getTags (itemIdentifier item)
if null ts
then noResult "no tags"
else return (map toItem ts)
where where
toItem t = Item (fromFilePath (t ++ "/index.html")) t toItem t = Item (fromFilePath (t ++ "/index.html")) t
ctx = field "tag-name" (return . itemBody) ctx = field "tag-name" (return . itemBody)
@ -345,7 +358,7 @@ abstractField :: Context String
abstractField = field "abstract" $ \item -> do abstractField = field "abstract" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
case lookupString "abstract" meta of case lookupString "abstract" meta of
Nothing -> fail "no abstract" Nothing -> noResult "no abstract"
Just src -> do Just src -> do
let pandocResult = runPure $ do let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src) doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -379,7 +392,7 @@ descriptionField :: Context String
descriptionField = field "description" $ \item -> do descriptionField = field "description" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
case lookupString "abstract" meta of case lookupString "abstract" meta of
Nothing -> fail "no abstract" Nothing -> noResult "no abstract"
Just src -> do Just src -> do
let pandocResult = runPure $ do let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src) doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -416,7 +429,7 @@ summaryField :: Context String
summaryField = field "summary" $ \item -> do summaryField = field "summary" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
case lookupString "summary" meta of case lookupString "summary" meta of
Nothing -> fail "no summary" Nothing -> noResult "no summary"
Just src -> do Just src -> do
let pandocResult = runPure $ do let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src) doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -462,11 +475,11 @@ bibliographyField = bibContent <> hasCitations
where where
bibContent = field "bibliography" $ \item -> do bibContent = field "bibliography" $ \item -> do
bib <- itemBody <$> loadSnapshot (itemIdentifier item) "bibliography" bib <- itemBody <$> loadSnapshot (itemIdentifier item) "bibliography"
if null bib then fail "no bibliography" else return bib if null bib then noResult "no bibliography" else return bib
hasCitations = field "has-citations" $ \item -> do hasCitations = field "has-citations" $ \item -> do
bib <- itemBody <$> (loadSnapshot (itemIdentifier item) "bibliography" bib <- itemBody <$> (loadSnapshot (itemIdentifier item) "bibliography"
:: Compiler (Item String)) :: Compiler (Item String))
if null bib then fail "no citations" else return "true" if null bib then noResult "no citations" else return "true"
-- | Further-reading field: loads the further-reading HTML saved by essayCompiler. -- | Further-reading field: loads the further-reading HTML saved by essayCompiler.
-- Returns noResult (making $if(further-reading-refs)$ false) when empty. -- Returns noResult (making $if(further-reading-refs)$ false) when empty.
@ -474,22 +487,26 @@ furtherReadingField :: Context String
furtherReadingField = field "further-reading-refs" $ \item -> do furtherReadingField = field "further-reading-refs" $ \item -> do
fr <- itemBody <$> (loadSnapshot (itemIdentifier item) "further-reading-refs" fr <- itemBody <$> (loadSnapshot (itemIdentifier item) "further-reading-refs"
:: Compiler (Item String)) :: Compiler (Item String))
if null fr then fail "no further reading" else return fr if null fr then noResult "no further reading" else return fr
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Epistemic fields -- Epistemic fields
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | Render an integer 15 frontmatter key as filled/empty dot chars. -- | Render an integer 15 frontmatter key as filled/empty dot chars.
-- Returns @noResult@ when the key is absent or unparseable. -- Returns @noResult@ when the key is absent, unparseable, or below 1
-- (a zero would otherwise render five empty circles); values above 5
-- clamp to 5.
dotsField :: String -> String -> Context String dotsField :: String -> String -> Context String
dotsField ctxKey metaKey = field ctxKey $ \item -> do dotsField ctxKey metaKey = field ctxKey $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
case lookupString metaKey meta >>= readMaybe of case lookupString metaKey meta >>= readMaybe of
Nothing -> fail (ctxKey ++ ": not set") Nothing -> noResult (ctxKey ++ ": not set")
Just (n :: Int) -> Just (n :: Int)
let v = max 0 (min 5 n) | n < 1 -> noResult (ctxKey ++ ": value below the 1-5 scale")
in return (replicate v '\x25CF' ++ replicate (5 - v) '\x25CB') | otherwise ->
let v = min 5 n
in return (replicate v '\x25CF' ++ replicate (5 - v) '\x25CB')
-- | @$confidence-trend$@: ↑, ↓, or → derived from the last two entries -- | @$confidence-trend$@: ↑, ↓, or → derived from the last two entries
-- in the @confidence-history@ frontmatter list. Returns @noResult@ when -- in the @confidence-history@ frontmatter list. Returns @noResult@ when
@ -513,11 +530,11 @@ confidenceTrendField = field "confidence-trend" $ \item -> do
"[Marks] " ++ toFilePath (itemIdentifier item) ++ "[Marks] " ++ toFilePath (itemIdentifier item) ++
": confidence: proved is incompatible with confidence-history; ignoring history" ": confidence: proved is incompatible with confidence-history; ignoring history"
Nothing -> return () Nothing -> return ()
fail "confidence is proved; trend suppressed" noResult "confidence is proved; trend suppressed"
else case lookupStringList "confidence-history" meta of else case lookupStringList "confidence-history" meta of
Nothing -> fail "no confidence history" Nothing -> noResult "no confidence history"
Just xs -> case lastTwo xs of Just xs -> case lastTwo xs of
Nothing -> fail "no confidence history" Nothing -> noResult "no confidence history"
Just (prevS, curS) -> Just (prevS, curS) ->
let prev = readMaybe prevS :: Maybe Int let prev = readMaybe prevS :: Maybe Int
cur = readMaybe curS :: Maybe Int cur = readMaybe curS :: Maybe Int
@ -583,7 +600,7 @@ overallScoreField = field "overall-score" $ \item -> do
+ fromIntegral (ev - 1) / 4.0 * 0.4 + fromIntegral (ev - 1) / 4.0 * 0.4
score = max 0 (min 100 (round (raw * 100.0) :: Int)) score = max 0 (min 100 (round (raw * 100.0) :: Int))
in return (show score) in return (show score)
_ -> fail "overall-score: confidence or evidence not set" _ -> noResult "overall-score: confidence or evidence not set"
-- | @$confidence$@: numeric override that suppresses the @proved@ / -- | @$confidence$@: numeric override that suppresses the @proved@ /
-- @proven@ sentinel. When the frontmatter value is parseable as an -- @proven@ sentinel. When the frontmatter value is parseable as an
@ -996,7 +1013,7 @@ compositionCtx =
hasScoreField = field "has-score" $ \item -> do hasScoreField = field "has-score" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
let pages = fromMaybe [] (lookupStringList "score-pages" meta) let pages = fromMaybe [] (lookupStringList "score-pages" meta)
if null pages then fail "no score pages" else return "true" if null pages then noResult "no score pages" else return "true"
scorePageCountField = field "score-page-count" $ \item -> do scorePageCountField = field "score-page-count" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
@ -1014,7 +1031,7 @@ compositionCtx =
hasMovementsField = field "has-movements" $ \item -> do hasMovementsField = field "has-movements" $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
if null (parseMovements meta) then fail "no movements" else return "true" if null (parseMovements meta) then noResult "no movements" else return "true"
movementsListField = listFieldWith "movements" movCtx $ \item -> do movementsListField = listFieldWith "movements" movCtx $ \item -> do
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
@ -1032,9 +1049,9 @@ compositionCtx =
<> field "movement-page" (return . show . movPage . itemBody) <> field "movement-page" (return . show . movPage . itemBody)
<> field "movement-duration" (return . movDuration . itemBody) <> field "movement-duration" (return . movDuration . itemBody)
<> field "movement-audio" <> field "movement-audio"
(\i -> maybe (fail "no audio") return (movAudio (itemBody i))) (\i -> maybe (noResult "no audio") return (movAudio (itemBody i)))
<> field "has-audio" <> field "has-audio"
(\i -> maybe (fail "no audio") (const (return "true")) (\i -> maybe (noResult "no audio") (const (return "true"))
(movAudio (itemBody i))) (movAudio (itemBody i)))
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------

View File

@ -30,22 +30,45 @@ import Text.Pandoc.Walk (walk)
import ArchiveIndex (ArchiveStatus (..), archiveIndexIsEmpty, import ArchiveIndex (ArchiveStatus (..), archiveIndexIsEmpty,
archiveSlugFor, archiveStatusForSlug) archiveSlugFor, archiveStatusForSlug)
-- | Annotate body links. Headings are left alone — an affordance there -- | Annotate body links. Links inside headings are left alone at
-- would be noise. Identity when the index is empty. -- /every/ nesting depth — an affordance there would be noise, and a
-- top-level pattern match would miss a @Header@ inside a @Div@ or
-- @BlockQuote@. Header links are tagged with a sentinel class before
-- the annotation walk and stripped of it afterwards, so the sentinel
-- can never leak into the writer. Identity when the index is empty.
apply :: Pandoc -> Pandoc apply :: Pandoc -> Pandoc
apply doc@(Pandoc meta blocks) apply doc
| archiveIndexIsEmpty = doc | archiveIndexIsEmpty = doc
| otherwise = Pandoc meta (map annotateBlock blocks) | otherwise =
walk unprotectLink . walk annotateInlines . walk protectHeader $ doc
annotateBlock :: Block -> Block -- | Sentinel class marking a link the annotation walk must skip. It
annotateBlock h@Header{} = h -- only exists between the protect and unprotect walks inside 'apply'.
annotateBlock b = walk annotateInlines b skipClass :: T.Text
skipClass = "archive-header-skip"
protectHeader :: Block -> Block
protectHeader (Header lvl attr ils) = Header lvl attr (walk protect ils)
where
protect (Link (ident, cls, kvs) text target) =
Link (ident, skipClass : cls, kvs) text target
protect x = x
protectHeader b = b
unprotectLink :: Inline -> Inline
unprotectLink (Link (ident, cls, kvs) text target)
| skipClass `elem` cls =
Link (ident, filter (/= skipClass) cls, kvs) text target
unprotectLink x = x
-- | For each archived @Link@: flip it if the target is 'Rotted', else -- | For each archived @Link@: flip it if the target is 'Rotted', else
-- append the affordance. Non-archived links pass through untouched. -- append the affordance. Non-archived links — and links protected by
-- 'protectHeader' — pass through untouched.
annotateInlines :: [Inline] -> [Inline] annotateInlines :: [Inline] -> [Inline]
annotateInlines = concatMap expand annotateInlines = concatMap expand
where where
expand l@(Link (_, cls, _) _ _)
| skipClass `elem` cls = [l]
expand l@(Link attr text (url, _)) = expand l@(Link attr text (url, _)) =
case archiveSlugFor url of case archiveSlugFor url of
Nothing -> [l] Nothing -> [l]

View File

@ -12,15 +12,23 @@
-- --
-- The file path must be root-relative (begins with @/@). -- The file path must be root-relative (begins with @/@).
-- PDF.js is expected to be vendored at @/pdfjs/web/viewer.html@. -- PDF.js is expected to be vendored at @/pdfjs/web/viewer.html@.
--
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched ('Filters.Wikilinks.mapOutsideFences'),
-- so fenced examples can show @{{pdf:…}}@ literally. Indented code
-- blocks and inline code spans are NOT recognised — a full-line
-- directive inside either is still rewritten.
module Filters.EmbedPdf (preprocess) where module Filters.EmbedPdf (preprocess) where
import Data.Char (isDigit) import Data.Char (isDigit)
import Data.List (isPrefixOf, isSuffixOf) import Data.List (isPrefixOf, isSuffixOf)
import Filters.Wikilinks (mapOutsideFences)
import qualified Utils as U import qualified Utils as U
-- | Apply PDF-embed substitution to the raw Markdown source string. -- | Apply PDF-embed substitution to the raw Markdown source string,
-- skipping lines inside fenced code blocks.
preprocess :: String -> String preprocess :: String -> String
preprocess = unlines . map processLine . lines preprocess = mapOutsideFences processLine
processLine :: String -> String processLine :: String -> String
processLine line = processLine line =

View File

@ -231,7 +231,7 @@ renderPicture :: Attr -> [Inline] -> Target -> Bool -> Maybe (Int, Int) -> Text
renderPicture (ident, classes, kvs) alt (src, title) lightbox dims = renderPicture (ident, classes, kvs) alt (src, title) lightbox dims =
T.concat T.concat
[ "<picture>" [ "<picture>"
, "<source srcset=\"", T.pack webpSrc, "\" type=\"image/webp\">" , "<source srcset=\"", esc (T.pack webpSrc), "\" type=\"image/webp\">"
, "<img" , "<img"
, attrId ident , attrId ident
, attrClasses classes , attrClasses classes

View File

@ -16,8 +16,11 @@ import Text.Pandoc.Definition
import Text.Pandoc.Walk (walk) import Text.Pandoc.Walk (walk)
-- | Apply link classification to the entire document. -- | Apply link classification to the entire document.
-- Two passes: PDF links first (rewrites href to viewer URL), then external -- Two passes: PDF links first (rewrites href to the viewer URL and tags
-- link classification (operates on http/https, so no overlap). -- the anchor @pdf-link@), then general classification. The second pass
-- explicitly skips anchors the PDF pass already claimed — the viewer URL
-- is root-relative, so without that guard it would also be classified as
-- an internal page link and get double chrome.
apply :: Pandoc -> Pandoc apply :: Pandoc -> Pandoc
apply = walk classifyLink . walk classifyPdfLink apply = walk classifyLink . walk classifyPdfLink
@ -49,6 +52,11 @@ classifyLink l@(Link (_, classes, _) _ _)
-- brand icon stamp, and have their own popup provider. Leave them -- brand icon stamp, and have their own popup provider. Leave them
-- entirely alone. -- entirely alone.
| "source-ref" `elem` classes = l | "source-ref" `elem` classes = l
-- PDF links were already rewritten to the (root-relative) viewer URL
-- and given their own chrome by 'classifyPdfLink' in the preceding
-- pass; without this guard they would be double-classified as
-- internal page links.
| "pdf-link" `elem` classes = l
classifyLink (Link (ident, classes, kvs) ils (url, title)) classifyLink (Link (ident, classes, kvs) ils (url, title))
| isExternal url = | isExternal url =
let icon = domainIcon url let icon = domainIcon url
@ -100,8 +108,9 @@ isExternal url =
where where
siteHost = "levineuwirth.org" siteHost = "levineuwirth.org"
-- | Extract the lowercased hostname from an absolute http(s) URL. -- | Extract the lowercased hostname from an absolute http(s) URL,
-- Returns 'Nothing' for non-http(s) URLs (relative paths, mailto:, etc.). -- stripping any userinfo (@user:pass\@@) and port. Returns 'Nothing'
-- for non-http(s) URLs (relative paths, mailto:, etc.).
extractHost :: Text -> Maybe Text extractHost :: Text -> Maybe Text
extractHost url extractHost url
| Just rest <- T.stripPrefix "https://" url = Just (hostOf rest) | Just rest <- T.stripPrefix "https://" url = Just (hostOf rest)
@ -109,45 +118,60 @@ extractHost url
| otherwise = Nothing | otherwise = Nothing
where where
hostOf rest = hostOf rest =
let withPort = T.takeWhile (\c -> c /= '/' && c /= '?' && c /= '#') rest let authority = T.takeWhile (\c -> c /= '/' && c /= '?' && c /= '#') rest
host = T.takeWhile (/= ':') withPort -- 'T.breakOnEnd' yields the segment after the last @\@@, or
-- the whole authority when there is no userinfo.
(_, hostPort) = T.breakOnEnd "@" authority
host = T.takeWhile (/= ':') hostPort
in T.toLower host in T.toLower host
-- | Icon name for the link, matching a file in /images/link-icons/<name>.svg. -- | Icon name for the link, matching a file in /images/link-icons/<name>.svg.
--
-- Matches on the URL's host only, never on the full URL — a path like
-- @https://example.org/why-x.com-failed@ must not get the Twitter
-- icon. URLs with no extractable host get the generic icon.
domainIcon :: Text -> Text domainIcon :: Text -> Text
domainIcon url domainIcon url = maybe "external" iconForHost (extractHost url)
iconForHost :: Text -> Text
iconForHost host
-- Scholarly / reference -- Scholarly / reference
| "wikipedia.org" `T.isInfixOf` url = "wikipedia" | m "wikipedia.org" = "wikipedia"
| "arxiv.org" `T.isInfixOf` url = "arxiv" | m "arxiv.org" = "arxiv"
| "doi.org" `T.isInfixOf` url = "doi" | m "doi.org" = "doi"
| "worldcat.org" `T.isInfixOf` url = "worldcat" | m "worldcat.org" = "worldcat"
| "orcid.org" `T.isInfixOf` url = "orcid" | m "orcid.org" = "orcid"
| "archive.org" `T.isInfixOf` url = "internet-archive" | m "archive.org" = "internet-archive"
-- Code / software -- Code / software
| "github.com" `T.isInfixOf` url = "github" | m "github.com" = "github"
| "git.levineuwirth.org" `T.isInfixOf` url = "forgejo" | m "git.levineuwirth.org" = "forgejo"
| "tensorflow.org" `T.isInfixOf` url = "tensorflow" | m "tensorflow.org" = "tensorflow"
-- AI companies (consumer products share a brand icon with the lab) -- AI companies (consumer products share a brand icon with the lab)
| "anthropic.com" `T.isInfixOf` url = "anthropic" | m "anthropic.com" = "anthropic"
| "claude.ai" `T.isInfixOf` url = "anthropic" | m "claude.ai" = "anthropic"
| "openai.com" `T.isInfixOf` url = "openai" | m "openai.com" = "openai"
| "chatgpt.com" `T.isInfixOf` url = "openai" | m "chatgpt.com" = "openai"
-- Social / media -- Social / media
| "twitter.com" `T.isInfixOf` url = "twitter" | m "twitter.com" = "twitter"
| "x.com" `T.isInfixOf` url = "twitter" | m "x.com" = "twitter"
| "reddit.com" `T.isInfixOf` url = "reddit" | m "reddit.com" = "reddit"
| "youtube.com" `T.isInfixOf` url = "youtube" | m "youtube.com" = "youtube"
| "youtu.be" `T.isInfixOf` url = "youtube" | m "youtu.be" = "youtube"
| "tiktok.com" `T.isInfixOf` url = "tiktok" | m "tiktok.com" = "tiktok"
| "substack.com" `T.isInfixOf` url = "substack" | m "substack.com" = "substack"
| "news.ycombinator.com" `T.isInfixOf` url = "hacker-news" | m "news.ycombinator.com" = "hacker-news"
| "lesswrong.com" `T.isInfixOf` url = "lesswrong" | m "lesswrong.com" = "lesswrong"
-- News -- News
| "nytimes.com" `T.isInfixOf` url = "new-york-times" | m "nytimes.com" = "new-york-times"
-- Institutions -- Institutions
| "nasa.gov" `T.isInfixOf` url = "nasa" | m "nasa.gov" = "nasa"
| "apple.com" `T.isInfixOf` url = "apple" | m "apple.com" = "apple"
| otherwise = "external" | otherwise = "external"
where
-- Label-suffix match: the host is the domain itself or a subdomain
-- of it. Never fires on a lookalike label (@notx.com@) or on text
-- in the path or query.
m d = host == d || ("." <> d) `T.isSuffixOf` host
-- | Percent-encode characters that would break a @?file=@ query-string value. -- | Percent-encode characters that would break a @?file=@ query-string value.
-- Slashes are intentionally left unencoded so root-relative paths remain -- Slashes are intentionally left unencoded so root-relative paths remain

View File

@ -15,6 +15,7 @@
module Filters.Score (inlineScores) where module Filters.Score (inlineScores) where
import Control.Exception (IOException, try) import Control.Exception (IOException, try)
import Data.Char (isHexDigit)
import Data.Maybe (listToMaybe) import Data.Maybe (listToMaybe)
import qualified Data.Text as T import qualified Data.Text as T
import qualified Data.Text.IO as TIO import qualified Data.Text.IO as TIO
@ -86,25 +87,48 @@ findImagePath blocks = listToMaybe
-- | Replace hardcoded black fill/stroke values with @currentColor@ so the -- | Replace hardcoded black fill/stroke values with @currentColor@ so the
-- SVG inherits the CSS @color@ property in both light and dark modes. -- SVG inherits the CSS @color@ property in both light and dark modes.
-- --
-- 6-digit hex patterns are at the bottom of the composition chain -- Quoted attribute forms (@fill="#000"@) are self-delimiting — the
-- (applied first) so they are replaced before the 3-digit shorthand, -- closing quote bounds the match — so plain 'T.replace' is safe for
-- preventing partial matches (e.g. @#000@ matching the prefix of @#000000@). -- them. Unquoted style-property forms (@fill:#000@) are not: naive
-- replacement would also fire on the prefix of a longer hex colour
-- (@fill:#000080@ → @fill:currentColor80@, invalid CSS). Those go
-- through 'replaceHexColor', which rewrites a match only when it is
-- not followed by another hex digit; the boundary check also makes
-- the 3-digit/6-digit application order irrelevant.
processColors :: T.Text -> T.Text processColors :: T.Text -> T.Text
processColors processColors
-- 3-digit hex and keyword patterns (applied after 6-digit replacements) -- 3-digit hex and keyword patterns
= T.replace "fill=\"#000\"" "fill=\"currentColor\"" = T.replace "fill=\"#000\"" "fill=\"currentColor\""
. T.replace "fill=\"black\"" "fill=\"currentColor\"" . T.replace "fill=\"black\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000\"" "stroke=\"currentColor\"" . T.replace "stroke=\"#000\"" "stroke=\"currentColor\""
. T.replace "stroke=\"black\"" "stroke=\"currentColor\"" . T.replace "stroke=\"black\"" "stroke=\"currentColor\""
. T.replace "fill:#000" "fill:currentColor" . replaceHexColor "fill:#000" "fill:currentColor"
. T.replace "fill:black" "fill:currentColor" . T.replace "fill:black" "fill:currentColor"
. T.replace "stroke:#000" "stroke:currentColor" . replaceHexColor "stroke:#000" "stroke:currentColor"
. T.replace "stroke:black" "stroke:currentColor" . T.replace "stroke:black" "stroke:currentColor"
-- 6-digit hex patterns (applied first — bottom of the chain) -- 6-digit hex patterns (applied first — bottom of the chain)
. T.replace "fill=\"#000000\"" "fill=\"currentColor\"" . T.replace "fill=\"#000000\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000000\"" "stroke=\"currentColor\"" . T.replace "stroke=\"#000000\"" "stroke=\"currentColor\""
. T.replace "fill:#000000" "fill:currentColor" . replaceHexColor "fill:#000000" "fill:currentColor"
. T.replace "stroke:#000000" "stroke:currentColor" . replaceHexColor "stroke:#000000" "stroke:currentColor"
-- | 'T.replace' restricted to hex-boundary-terminated matches: an
-- occurrence of @needle@ is rewritten only when the character after
-- it is not another hex digit, so @fill:#000@ never fires inside the
-- longer colours @fill:#0008@, @fill:#000080@, or @fill:#00000080@.
replaceHexColor :: T.Text -> T.Text -> T.Text -> T.Text
replaceHexColor needle replacement = go
where
go t =
let (pre, rest) = T.breakOn needle t
in if T.null rest
then pre
else
let after = T.drop (T.length needle) rest
in case T.uncons after of
Just (c, _) | isHexDigit c ->
pre <> needle <> go after
_ -> pre <> replacement <> go after
buildHtml :: Maybe T.Text -> Maybe T.Text -> T.Text -> T.Text buildHtml :: Maybe T.Text -> Maybe T.Text -> T.Text -> T.Text
buildHtml mName mCaption svgContent = T.concat buildHtml mName mCaption svgContent = T.concat

View File

@ -4,12 +4,23 @@
-- --
-- Each footnote becomes: -- Each footnote becomes:
-- * A @<sup class="sidenote-ref">@ anchor in the body text. -- * A @<sup class="sidenote-ref">@ anchor in the body text.
-- * An @<aside class="sidenote">@ immediately following it, containing -- * A @<span class="sidenote">@ immediately following it, containing
-- the rendered note content. -- the rendered note content.
-- --
-- On wide viewports, sidenotes.css floats asides into the right margin. -- Additionally, every consumed note is re-emitted in a
-- On narrow viewports they are hidden; the standard Pandoc-generated -- @<section class="footnotes">@ appended at the document end. The
-- @<section class="footnotes">@ at the document end serves as fallback. -- filter swallows Pandoc's own @Note@ inlines, so Pandoc's writer
-- never produces that section itself — without this re-emission,
-- narrow viewports with JavaScript disabled (where sidenotes.css
-- hides @.sidenote@ and sidenotes.js's bottom sheet never runs)
-- would lose footnote content entirely.
--
-- On wide viewports, sidenotes.css floats the spans into the right
-- margin and hides @section.footnotes@; on narrow viewports the
-- spans are hidden and the section is shown. The in-text anchor
-- targets the footnotes item (the only target visible on narrow
-- no-JS viewports); sidenotes.js intercepts clicks and pairs
-- ref\/note by element id, so the href is purely the no-JS path.
module Filters.Sidenotes (apply) where module Filters.Sidenotes (apply) where
import Control.Monad.State.Strict import Control.Monad.State.Strict
@ -18,21 +29,58 @@ import Data.Text (Text)
import qualified Data.Text as T import qualified Data.Text as T
import Text.Pandoc.Class (runPure) import Text.Pandoc.Class (runPure)
import Text.Pandoc.Definition import Text.Pandoc.Definition
import Text.Pandoc.Options (WriterOptions) import Text.Pandoc.Options (WriterOptions (..),
HTMLMathMethod (KaTeX))
import Text.Pandoc.Walk (walkM) import Text.Pandoc.Walk (walkM)
import Text.Pandoc.Writers.HTML (writeHtml5String) import Text.Pandoc.Writers.HTML (writeHtml5String)
-- | Transform all @Note@ inlines in the document to inline sidenote HTML. -- | Accumulator: next label counter plus collected notes
apply :: Pandoc -> Pandoc -- (newest-first; reversed before rendering the fallback section).
apply doc = evalState (walkM convertNote doc) (1 :: Int) type NoteState = (Int, [(Text, [Block])])
convertNote :: Inline -> State Int Inline -- | Transform all @Note@ inlines in the document to inline sidenote
-- HTML, and append the collected notes as a @section.footnotes@
-- fallback block.
apply :: Pandoc -> Pandoc
apply doc =
let (Pandoc m blocks, (_, collected)) =
runState (walkM convertNote doc) (1, [])
notes = reverse collected
in Pandoc m $
if null notes
then blocks
else blocks ++ [footnotesSection notes]
convertNote :: Inline -> State NoteState Inline
convertNote (Note blocks) = do convertNote (Note blocks) = do
n <- get (n, acc) <- get
put (n + 1) put (n + 1, (toLabel n, blocks) : acc)
return $ RawInline "html" (renderNote n blocks) return $ RawInline "html" (renderNote n blocks)
convertNote x = return x convertNote x = return x
-- | The end-of-document fallback list. Letter labels are rendered
-- explicitly (an @<ol>@'s automatic numbering would disagree with
-- the in-text letters), so the list itself is unstyled.
footnotesSection :: [(Text, [Block])] -> Block
footnotesSection notes = RawBlock "html" $ T.concat $
[ "<section class=\"footnotes\" role=\"doc-endnotes\">"
, "<ol class=\"footnotes-list\">"
]
++ map item notes ++
[ "</ol>"
, "</section>"
]
where
item (lbl, blocks) = T.concat
[ "<li id=\"fn-", lbl, "\" class=\"footnote-item\">"
, "<span class=\"footnote-label\" aria-hidden=\"true\">", lbl, "</span>"
, blocksToHtml blocks
, "<a href=\"#snref-", lbl
, "\" class=\"footnote-back\" role=\"doc-backlink\""
, " aria-label=\"Back to reference ", lbl, "\">\x21a9\xfe0e</a>"
, "</li>"
]
-- | Convert a 1-based counter to a letter label using base-26 expansion -- | Convert a 1-based counter to a letter label using base-26 expansion
-- (Excel-column style): 1→a, 2→b, … 26→z, 27→aa, 28→ab, … 52→az, -- (Excel-column style): 1→a, 2→b, … 26→z, 27→aa, 28→ab, … 52→az,
-- 53→ba, … 702→zz, 703→aaa. Guarantees a unique label per counter so -- 53→ba, … 702→zz, 703→aaa. Guarantees a unique label per counter so
@ -53,8 +101,14 @@ renderNote n blocks =
let inner = blocksToInlineHtml blocks let inner = blocksToInlineHtml blocks
lbl = toLabel n lbl = toLabel n
in T.concat in T.concat
-- href targets the footnotes-section item: on narrow no-JS
-- viewports that is the only visible rendering of the note
-- (the adjacent .sidenote span is display:none there, and on
-- wide viewports the note is already visible in the margin).
-- sidenotes.js pairs ref/note by id and preventDefaults the
-- click, so the href only ever navigates without JS.
[ "<sup class=\"sidenote-ref\" id=\"snref-", lbl, "\">" [ "<sup class=\"sidenote-ref\" id=\"snref-", lbl, "\">"
, "<a href=\"#sn-", lbl, "\">", lbl, "</a>" , "<a href=\"#fn-", lbl, "\">", lbl, "</a>"
, "</sup>" , "</sup>"
, "<span class=\"sidenote\" id=\"sn-", lbl, "\">" , "<span class=\"sidenote\" id=\"sn-", lbl, "\">"
, "<sup class=\"sidenote-num\">", lbl, "</sup>\x00a0" , "<sup class=\"sidenote-num\">", lbl, "</sup>\x00a0"
@ -84,16 +138,25 @@ blocksToInlineHtml = T.concat . map renderOne
renderOne b = renderOne b =
blocksToHtml [b] blocksToHtml [b]
-- | Writer options for note bodies. Must agree with the math method in
-- 'Compilers.writerOpts' (KaTeX), or math inside a footnote silently
-- degrades to the writer default (PlainMath -> italics) and the
-- client-side KaTeX pass never sees it. Defined locally because
-- importing Compilers from here would create a module cycle
-- (Compilers -> Filters -> Filters.Sidenotes).
noteWriterOpts :: WriterOptions
noteWriterOpts = def { writerHTMLMathMethod = KaTeX "" }
-- | Render a list of inlines to HTML (no surrounding @<p>@). -- | Render a list of inlines to HTML (no surrounding @<p>@).
inlinesToHtml :: [Inline] -> Text inlinesToHtml :: [Inline] -> Text
inlinesToHtml inlines = inlinesToHtml inlines =
case runPure (writeHtml5String (def :: WriterOptions) (Pandoc mempty [Plain inlines])) of case runPure (writeHtml5String noteWriterOpts (Pandoc mempty [Plain inlines])) of
Left _ -> T.empty Left _ -> T.empty
Right t -> t Right t -> t
-- | Render a list of Pandoc blocks to an HTML fragment via a pure writer run. -- | Render a list of Pandoc blocks to an HTML fragment via a pure writer run.
blocksToHtml :: [Block] -> Text blocksToHtml :: [Block] -> Text
blocksToHtml blocks = blocksToHtml blocks =
case runPure (writeHtml5String (def :: WriterOptions) (Pandoc mempty blocks)) of case runPure (writeHtml5String noteWriterOpts (Pandoc mempty blocks)) of
Left _ -> T.empty Left _ -> T.empty
Right t -> t Right t -> t

View File

@ -14,7 +14,8 @@
-- extra filter logic is needed for that case. -- extra filter logic is needed for that case.
-- --
-- The filter is /not/ applied inside headings (where Fira Sans uppercase -- The filter is /not/ applied inside headings (where Fira Sans uppercase
-- text looks intentional) or inside @Code@/@RawInline@ inlines. -- text looks intentional, at any nesting depth — including headings
-- inside divs and block quotes) or inside @Code@/@RawInline@ inlines.
module Filters.Smallcaps (apply) where module Filters.Smallcaps (apply) where
import Data.Char (isUpper, isAlpha) import Data.Char (isUpper, isAlpha)
@ -25,13 +26,31 @@ import Text.Pandoc.Walk (walk)
import qualified Utils as U import qualified Utils as U
-- | Apply smallcaps detection to paragraph-level content. -- | Apply smallcaps detection to paragraph-level content.
-- Skips heading blocks to avoid false positives. -- Heading blocks are skipped at /every/ nesting level (a top-level
-- pattern match would miss a @Header@ inside a @Div@ or
-- @BlockQuote@): each header's @Str@ content is swapped for a
-- sentinel 'RawInline' before the wrapping walk and restored
-- afterwards, so 'wrapCaps' can never see it, wherever the header
-- sits in the block tree.
apply :: Pandoc -> Pandoc apply :: Pandoc -> Pandoc
apply (Pandoc meta blocks) = Pandoc meta (map applyBlock blocks) apply = walk restoreStr . walk wrapCaps . walk protectHeader
applyBlock :: Block -> Block -- | Sentinel format marking a @Str@ that must not be wrapped. It only
applyBlock b@(Header {}) = b -- leave headings untouched -- exists between the protect and restore walks inside 'apply' and
applyBlock b = walk wrapCaps b -- can never leak into the writer.
skipFmt :: Format
skipFmt = Format "smallcaps-skip"
protectHeader :: Block -> Block
protectHeader (Header lvl attr ils) = Header lvl attr (walk protectStr ils)
where
protectStr (Str t) = RawInline skipFmt t
protectStr x = x
protectHeader b = b
restoreStr :: Inline -> Inline
restoreStr (RawInline fmt t) | fmt == skipFmt = Str t
restoreStr x = x
-- | Wrap an all-caps Str token in an abbr element, preserving any trailing -- | Wrap an all-caps Str token in an abbr element, preserving any trailing
-- punctuation (comma, period, colon, semicolon, closing paren/bracket) -- punctuation (comma, period, colon, semicolon, closing paren/bracket)

View File

@ -19,12 +19,15 @@
-- source-preview rule in 'Site.rules') and renders a -- source-preview rule in 'Site.rules') and renders a
-- syntax-highlighted snippet via Prism. -- syntax-highlighted snippet via Prism.
-- --
-- Conservative-by-design: the trigger only fires on paths under a -- Conservative-by-design: the trigger only fires on paths the
-- short whitelist of top-level directories, or a small set of named -- @/source/@ serving rule actually publishes ('isServedPath', a
-- root files. This keeps the parser cheap and avoids false positives -- mirror of @sourcePreviewable@ in 'Site.rules'), or a small set of
-- on words that happen to contain a slash and a dot. -- named root files. This keeps the parser cheap, avoids false
-- positives on words that happen to contain a slash and a dot, and
-- guarantees every wrapped path has a fetchable @/source/…@ copy.
module Filters.SourceRefs (apply, isSourcePath, forgejoSourceUrl) where module Filters.SourceRefs (apply, isSourcePath, forgejoSourceUrl) where
import Control.Monad (when)
import Data.IORef (IORef, atomicModifyIORef', newIORef, readIORef) import Data.IORef (IORef, atomicModifyIORef', newIORef, readIORef)
import qualified Data.Map.Strict as Map import qualified Data.Map.Strict as Map
import Data.Text (Text) import Data.Text (Text)
@ -94,16 +97,17 @@ classifyExistingLink x = pure x
-- Heuristic -- Heuristic
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | True when the text looks like a repo-relative path under one of -- | True when the text looks like a repo-relative path that the
-- the whitelisted directories (or is a whitelisted root file), ends -- @/source/@ serving rule actually publishes (or is a whitelisted
-- in a known source extension, and contains only safe path -- root file), ends in a known source extension, and contains only
-- characters. Conservative by design — the goal is no false -- safe path characters. Conservative by design — the goal is no
-- positives on prose that incidentally contains a slash and a dot. -- false positives on prose that incidentally contains a slash and a
-- dot, and no wrapped path whose popup fetch would 404.
isSourcePath :: Text -> Bool isSourcePath :: Text -> Bool
isSourcePath t = and isSourcePath t = and
[ not (T.null t) [ not (T.null t)
, T.all safeChar t , T.all safeChar t
, (hasKnownPrefix t && hasKnownExt t) || isKnownRootFile t , (isServedPath t && hasKnownExt t) || isKnownRootFile t
] ]
where where
safeChar c = safeChar c =
@ -112,11 +116,26 @@ isSourcePath t = and
|| ('0' <= c && c <= '9') || ('0' <= c && c <= '9')
|| c == '/' || c == '.' || c == '_' || c == '-' || c == '+' || c == '/' || c == '.' || c == '_' || c == '-' || c == '+'
hasKnownPrefix :: Text -> Bool -- | Mirror of the @sourcePreviewable@ whitelist in 'Site.rules' (the
hasKnownPrefix t = any (`T.isPrefixOf` t) -- rule that copies files to @/source/<path>@) — the two must stay
[ "build/", "static/", "templates/", "tools/" -- aligned so every link this filter emits has a corresponding
, "nginx/", "data/", "content/", "yaml-source/" -- @/source/…@ target for the popup to fetch. Directories Site.hs
-- does not serve (e.g. @content/@) are deliberately absent here:
-- wrapping them would emit popups that are guaranteed to 404.
isServedPath :: Text -> Bool
isServedPath t = or
[ "build/" `T.isPrefixOf` t && hasExt ".hs"
, "static/js/" `T.isPrefixOf` t
, "static/css/" `T.isPrefixOf` t
, "templates/" `T.isPrefixOf` t
, "tools/" `T.isPrefixOf` t && (hasExt ".sh" || hasExt ".py")
, "nginx/" `T.isPrefixOf` t && hasExt ".conf"
, "data/" `T.isPrefixOf` t
&& not ("/" `T.isInfixOf` T.drop 5 t) -- top-level data files only
&& (hasExt ".json" || hasExt ".yaml" || hasExt ".md" || hasExt ".bib")
] ]
where
hasExt e = e `T.isSuffixOf` T.toLower t
hasKnownExt :: Text -> Bool hasKnownExt :: Text -> Bool
hasKnownExt t = hasKnownExt t =
@ -125,7 +144,7 @@ hasKnownExt t =
[ ".hs", ".js", ".mjs", ".css", ".html" [ ".hs", ".js", ".mjs", ".css", ".html"
, ".py", ".cabal", ".md", ".yaml", ".yml" , ".py", ".cabal", ".md", ".yaml", ".yml"
, ".toml", ".sh", ".bash", ".svg", ".conf" , ".toml", ".sh", ".bash", ".svg", ".conf"
, ".json", ".ini", ".tex" , ".json", ".ini", ".tex", ".bib"
] ]
isKnownRootFile :: Text -> Bool isKnownRootFile :: Text -> Bool
@ -142,14 +161,19 @@ isKnownRootFile t = t `elem`
-- File existence cache -- File existence cache
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | Process-wide memo of @doesFileExist@ results, keyed by the same -- | Process-wide memo of /positive/ @doesFileExist@ results, keyed by
-- path the popup will fetch. Hakyll runs this filter once per -- the same path the popup will fetch. Hakyll runs this filter once
-- compiled page and the same source-file references recur across -- per compiled page and the same source-file references recur across
-- many pages (e.g. @build\/Filters\/Links.hs@ in the Links page, -- many pages (e.g. @build\/Filters\/Links.hs@ in the Links page,
-- the Colophon, several essays); the cache turns N stats into one -- the Colophon, several essays); the cache turns N stats into one
-- per distinct path. The build process's working directory is the -- per distinct path. Only existence is memoized: a missing file is
-- project root, so the path can be passed straight to -- re-stat'ed on every miss, so a source file created during a
-- 'doesFileExist' without prefixing. -- long-lived @make watch@ session is picked up on the next rebuild
-- instead of staying "absent" for the process lifetime. (A file
-- /deleted/ mid-watch stays cached as present until restart — the
-- benign direction: the popup fetch 404s and simply never appears.)
-- The build process's working directory is the project root, so the
-- path can be passed straight to 'doesFileExist' without prefixing.
{-# NOINLINE existsCacheRef #-} {-# NOINLINE existsCacheRef #-}
existsCacheRef :: IORef (Map.Map Text Bool) existsCacheRef :: IORef (Map.Map Text Bool)
existsCacheRef = unsafePerformIO (newIORef Map.empty) existsCacheRef = unsafePerformIO (newIORef Map.empty)
@ -161,7 +185,8 @@ existsCached path = do
Just b -> pure b Just b -> pure b
Nothing -> do Nothing -> do
b <- doesFileExist (T.unpack path) b <- doesFileExist (T.unpack path)
atomicModifyIORef' existsCacheRef (\m -> (Map.insert path b m, ())) when b $
atomicModifyIORef' existsCacheRef (\m -> (Map.insert path b m, ()))
pure b pure b
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------

View File

@ -5,7 +5,13 @@
-- HTML placeholders that transclude.js resolves at runtime. -- HTML placeholders that transclude.js resolves at runtime.
-- --
-- A directive must be the sole content of a line (after trimming) to be -- A directive must be the sole content of a line (after trimming) to be
-- replaced — this prevents accidental substitution inside prose or code. -- replaced — this prevents accidental substitution inside prose.
--
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched ('Filters.Wikilinks.mapOutsideFences'),
-- so fenced examples can show @{{slug}}@ literally. Indented code
-- blocks and inline code spans are NOT recognised — a full-line
-- directive inside either is still rewritten.
-- --
-- Examples: -- Examples:
-- {{my-essay}} → full-page transclusion of /my-essay.html -- {{my-essay}} → full-page transclusion of /my-essay.html
@ -14,11 +20,13 @@
module Filters.Transclusion (preprocess) where module Filters.Transclusion (preprocess) where
import Data.List (isSuffixOf, isPrefixOf, stripPrefix) import Data.List (isSuffixOf, isPrefixOf, stripPrefix)
import Filters.Wikilinks (mapOutsideFences)
import qualified Utils as U import qualified Utils as U
-- | Apply transclusion substitution to the raw Markdown source string. -- | Apply transclusion substitution to the raw Markdown source string,
-- skipping lines inside fenced code blocks.
preprocess :: String -> String preprocess :: String -> String
preprocess = unlines . map processLine . lines preprocess = mapOutsideFences processLine
processLine :: String -> String processLine :: String -> String
processLine line = processLine line =

View File

@ -37,6 +37,7 @@
module Filters.Viz (inlineViz) where module Filters.Viz (inlineViz) where
import Control.Exception (IOException, catch) import Control.Exception (IOException, catch)
import Data.Char (isHexDigit)
import Data.Maybe (fromMaybe) import Data.Maybe (fromMaybe)
import qualified Data.Text as T import qualified Data.Text as T
import System.Directory (doesFileExist) import System.Directory (doesFileExist)
@ -117,20 +118,47 @@ runScript baseDir attrs =
-- | Replace hardcoded black fill/stroke values with @currentColor@ so the -- | Replace hardcoded black fill/stroke values with @currentColor@ so the
-- embedded SVG inherits the CSS text colour in both light and dark modes. -- embedded SVG inherits the CSS text colour in both light and dark modes.
--
-- Quoted attribute forms (@fill="#000"@) are self-delimiting — the
-- closing quote bounds the match — so plain 'T.replace' is safe for
-- them. Unquoted style-property forms (@fill:#000@) are not: naive
-- replacement would also fire on the prefix of a longer hex colour
-- (@fill:#000080@ → @fill:currentColor80@, invalid CSS). Those go
-- through 'replaceHexColor', which rewrites a match only when it is
-- not followed by another hex digit.
processColors :: T.Text -> T.Text processColors :: T.Text -> T.Text
processColors processColors
= T.replace "fill=\"#000\"" "fill=\"currentColor\"" = T.replace "fill=\"#000\"" "fill=\"currentColor\""
. T.replace "fill=\"black\"" "fill=\"currentColor\"" . T.replace "fill=\"black\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000\"" "stroke=\"currentColor\"" . T.replace "stroke=\"#000\"" "stroke=\"currentColor\""
. T.replace "stroke=\"black\"" "stroke=\"currentColor\"" . T.replace "stroke=\"black\"" "stroke=\"currentColor\""
. T.replace "fill:#000" "fill:currentColor" . replaceHexColor "fill:#000" "fill:currentColor"
. T.replace "fill:black" "fill:currentColor" . T.replace "fill:black" "fill:currentColor"
. T.replace "stroke:#000" "stroke:currentColor" . replaceHexColor "stroke:#000" "stroke:currentColor"
. T.replace "stroke:black" "stroke:currentColor" . T.replace "stroke:black" "stroke:currentColor"
. T.replace "fill=\"#000000\"" "fill=\"currentColor\"" . T.replace "fill=\"#000000\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000000\"" "stroke=\"currentColor\"" . T.replace "stroke=\"#000000\"" "stroke=\"currentColor\""
. T.replace "fill:#000000" "fill:currentColor" . replaceHexColor "fill:#000000" "fill:currentColor"
. T.replace "stroke:#000000" "stroke:currentColor" . replaceHexColor "stroke:#000000" "stroke:currentColor"
-- | 'T.replace' restricted to hex-boundary-terminated matches: an
-- occurrence of @needle@ is rewritten only when the character after
-- it is not another hex digit, so @fill:#000@ never fires inside the
-- longer colours @fill:#0008@, @fill:#000080@, or @fill:#00000080@.
-- (Mirrors 'Filters.Score.replaceHexColor'.)
replaceHexColor :: T.Text -> T.Text -> T.Text -> T.Text
replaceHexColor needle replacement = go
where
go t =
let (pre, rest) = T.breakOn needle t
in if T.null rest
then pre
else
let after = T.drop (T.length needle) rest
in case T.uncons after of
Just (c, _) | isHexDigit c ->
pre <> needle <> go after
_ -> pre <> replacement <> go after
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- JSON safety for <script> embedding -- JSON safety for <script> embedding

View File

@ -12,23 +12,129 @@
-- replaced with hyphens, non-alphanumeric characters stripped, and -- replaced with hyphens, non-alphanumeric characters stripped, and
-- a @.html@ suffix appended so the link resolves identically under -- a @.html@ suffix appended so the link resolves identically under
-- the dev server, file:// previews, and nginx in production. -- the dev server, file:// previews, and nginx in production.
module Filters.Wikilinks (preprocess) where --
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched (see 'mapOutsideFences'), and within a
-- line, inline code spans (backtick runs, CommonMark equal-length
-- matching) are skipped — so both fenced and @`inline`@ examples can
-- show @[[…]]@ literally. Indented code blocks and code spans that
-- cross a line break are NOT recognised; a wikilink inside those is
-- still rewritten.
module Filters.Wikilinks (preprocess, mapOutsideFences) where
import Data.Char (isAlphaNum, toLower, isSpace) import Data.Char (isAlphaNum, toLower, isSpace)
import Data.List (intercalate) import Data.List (intercalate)
import qualified Utils as U import qualified Utils as U
-- | Scan the raw Markdown source for @[[…]]@ wikilinks and replace them -- | Scan the raw Markdown source for @[[…]]@ wikilinks and replace them
-- with standard Markdown link syntax. -- with standard Markdown link syntax. Processing is line-by-line and
-- skips fenced code blocks; a wikilink therefore cannot span a line
-- break (which was never a sensible authoring form).
preprocess :: String -> String preprocess :: String -> String
preprocess [] = [] preprocess = mapOutsideFences replaceWikilinks
preprocess ('[':'[':rest) =
case break (== ']') rest of replaceWikilinks :: String -> String
(inner, ']':']':after) replaceWikilinks = go
| not (null inner) -> where
toMarkdownLink inner ++ preprocess after go [] = []
_ -> '[' : '[' : preprocess rest -- Inline code span: a backtick run opens a span closed by a run of
preprocess (c:rest) = c : preprocess rest -- exactly the same length (CommonMark). Its body passes through
-- verbatim so documentation can quote @`[[…]]`@ literally. An
-- unclosed run is literal text — and then a following @[[…]]@ is
-- genuinely a wikilink, matching how Pandoc will read the line.
go s@('`':_) =
let (run, afterRun) = span (== '`') s
in case codeSpan (length run) afterRun of
Just (body, after) -> run ++ body ++ run ++ go after
Nothing -> run ++ go afterRun
go ('[':'[':rest) =
case break (== ']') rest of
(inner, ']':']':after)
| not (null inner) ->
toMarkdownLink inner ++ go after
_ -> '[' : '[' : go rest
go (c:rest) = c : go rest
-- @codeSpan n s@: the span body and the remainder after a closing
-- run of exactly @n@ backticks; 'Nothing' when no closer exists on
-- this line.
codeSpan :: Int -> String -> Maybe (String, String)
codeSpan n = loop
where
loop [] = Nothing
loop s@('`':_) =
let (run, rest) = span (== '`') s
in if length run == n
then Just ("", rest)
else prepend run <$> loop rest
loop (c:cs) = prepend [c] <$> loop cs
prepend pre (body, after) = (pre ++ body, after)
-- ---------------------------------------------------------------------------
-- Fence-aware line mapping (shared by all source-level preprocessors)
-- ---------------------------------------------------------------------------
-- | Apply a line transformation to every line that is not part of a
-- fenced code block. Shared by the three source-level preprocessors
-- (wikilinks here, 'Filters.Transclusion', 'Filters.EmbedPdf') so
-- their directive syntax can be quoted literally inside fenced code.
--
-- Fence tracking follows CommonMark: an opener is at most three
-- spaces of indentation followed by a run of at least three backticks
-- or tildes (longer runs allowed); for backtick fences the info
-- string may not contain a backtick. The closer uses the same fence
-- character, a run at least as long as the opener, and nothing but
-- whitespace after it. An unclosed fence extends to the end of the
-- document. Fence delimiter lines themselves pass through untouched.
--
-- Honest scope: only /fenced/ code blocks are protected. Indented
-- code blocks and inline code spans are not recognised here — a
-- directive inside either is still rewritten.
mapOutsideFences :: (String -> String) -> String -> String
mapOutsideFences f = unlines . go Nothing . lines
where
go _ [] = []
go Nothing (l:ls) =
case openingFence l of
Just fence -> l : go (Just fence) ls
Nothing -> f l : go Nothing ls
go st@(Just fence) (l:ls)
| closesFence fence l = l : go Nothing ls
| otherwise = l : go st ls
-- | The fence character and run length of a CommonMark fence opener,
-- or 'Nothing' when the line does not open a fence.
openingFence :: String -> Maybe (Char, Int)
openingFence l = do
rest <- stripFenceIndent l
case rest of
(c:_) | c == '`' || c == '~' ->
let run = takeWhile (== c) rest
n = length run
info = drop n rest
in if n >= 3 && (c == '~' || '`' `notElem` info)
then Just (c, n)
else Nothing
_ -> Nothing
-- | True when the line closes the fence opened by @(c, n)@: the same
-- fence character, a run at least as long as the opener, and only
-- whitespace after it.
closesFence :: (Char, Int) -> String -> Bool
closesFence (c, n) l =
case stripFenceIndent l of
Nothing -> False
Just rest ->
let run = takeWhile (== c) rest
in length run >= n && all isSpace (drop (length run) rest)
-- | Strip up to three leading spaces (the indentation CommonMark allows
-- on a fence line); 'Nothing' for four or more, which would be an
-- indented code block rather than a fence.
stripFenceIndent :: String -> Maybe String
stripFenceIndent l =
let (indent, rest) = span (== ' ') l
in if length indent <= 3 then Just rest else Nothing
-- | Convert the inner content of @[[…]]@ to a Markdown link. -- | Convert the inner content of @[[…]]@ to a Markdown link.
-- --

View File

@ -230,7 +230,7 @@ data EpistemicData = EpistemicData
, epPeerStatus :: Maybe String -- ^ Validated peer-status slug ('Nothing' when absent / unreviewed / invalid). , epPeerStatus :: Maybe String -- ^ Validated peer-status slug ('Nothing' when absent / unreviewed / invalid).
, epResultShape :: Maybe String -- ^ Validated result-shape value. , epResultShape :: Maybe String -- ^ Validated result-shape value.
, epStability :: String -- ^ Always one of the five stability labels. , epStability :: String -- ^ Always one of the five stability labels.
, epTrust :: Int -- ^ Trust score 0100 (60/40 weighted; @proved@ substitutes 100 for confidence). , epTrust :: Maybe Int -- ^ Trust score 0100 (60/40 weighted; @proved@ substitutes 100 for confidence). 'Nothing' when confidence or evidence is missing — no label is rendered.
} }
-- | Read the figure inputs from a Hakyll item's metadata + git history. -- | Read the figure inputs from a Hakyll item's metadata + git history.
@ -267,15 +267,16 @@ readEpistemicData item = do
trimS = trim' trimS = trim'
-- | Trust score: the same 60/40 weighted composite of confidence and -- | Trust score: the same 60/40 weighted composite of confidence and
-- evidence used by 'Contexts.overallScoreField'. Returns 0 when either -- evidence used by 'Contexts.overallScoreField'. Returns 'Nothing'
-- input is missing — which is fine for the figure (the polygon and -- when either input is missing — the figure then renders no trust
-- trust label simply collapse to the bare frame). -- label at all (it collapses to the bare frame), rather than a
computeTrust :: Maybe Int -> Maybe Int -> Int -- literal "0" indistinguishable from an authored zero score.
computeTrust :: Maybe Int -> Maybe Int -> Maybe Int
computeTrust (Just c) (Just e) = computeTrust (Just c) (Just e) =
let raw :: Double let raw :: Double
raw = fromIntegral c / 100.0 * 0.6 + fromIntegral (e - 1) / 4.0 * 0.4 raw = fromIntegral c / 100.0 * 0.6 + fromIntegral (e - 1) / 4.0 * 0.4
in max 0 (min 100 (round (raw * 100.0))) in Just (max 0 (min 100 (round (raw * 100.0))))
computeTrust _ _ = 0 computeTrust _ _ = Nothing
-- | Same predicate as 'Contexts.isProvedConfidence' — local copy to keep -- | Same predicate as 'Contexts.isProvedConfidence' — local copy to keep
-- the module's dependency graph light (Marks → Stability only). The -- the module's dependency graph light (Marks → Stability only). The
@ -390,15 +391,16 @@ renderEpistemicFigure d = T.concat
[ "<svg xmlns=\"http://www.w3.org/2000/svg\"" [ "<svg xmlns=\"http://www.w3.org/2000/svg\""
, " viewBox=\"0 0 200 200\"" , " viewBox=\"0 0 200 200\""
, " role=\"img\"" , " role=\"img\""
, " aria-label=\"Epistemic figure: trust ", T.pack (show (epTrust d)) , " aria-label=\"Epistemic figure: "
, ", stability ", T.pack (epStability d), "\">" , maybe "" (\t -> "trust " <> T.pack (show t) <> ", ") (epTrust d)
, "stability ", T.pack (epStability d), "\">"
, renderRoundel , renderRoundel
, renderGuides , renderGuides
, renderAxes , renderAxes
, renderPolygon d , renderPolygon d
, renderVertexMarks d , renderVertexMarks d
, renderTicks (epStability d) (epPeerStatus d) , renderTicks (epStability d) (epPeerStatus d)
, renderTrustLabel (epTrust d) , maybe "" renderTrustLabel (epTrust d)
, renderResultShape (epResultShape d) (epTrust d) , renderResultShape (epResultShape d) (epTrust d)
, "</svg>" , "</svg>"
] ]
@ -578,10 +580,11 @@ renderTrustLabel score = T.concat
, " opacity=\"0.7\">TRUST</text>" , " opacity=\"0.7\">TRUST</text>"
] ]
-- | Result-shape glyph immediately to the right of the trust score. -- | Result-shape glyph immediately to the right of the trust score —
renderResultShape :: Maybe String -> Int -> T.Text -- or centred in its place when no trust score is rendered.
renderResultShape :: Maybe String -> Maybe Int -> T.Text
renderResultShape Nothing _ = "" renderResultShape Nothing _ = ""
renderResultShape (Just shape) score = renderResultShape (Just shape) mScore =
let glyph = case shape of let glyph = case shape of
"positive" -> "+" "positive" -> "+"
"negative" -> "\x2212" -- minus sign (not hyphen-minus) "negative" -> "\x2212" -- minus sign (not hyphen-minus)
@ -589,15 +592,20 @@ renderResultShape (Just shape) score =
"comparative" -> "\x223C" -- "comparative" -> "\x223C" --
"descriptive" -> "\x25A1" -- □ "descriptive" -> "\x25A1" -- □
_ -> "" _ -> ""
-- Offset proportional to the trust number's width (digits ≈ 8 px each). -- Offset proportional to the trust number's width (digits ≈ 8 px
digitCount = length (show score) -- each); with no trust label the glyph takes the centre itself.
offset = fromIntegral digitCount * 4.5 + 3 :: Double (x, anchor) = case mScore of
Just score ->
let digitCount = length (show score)
offset = fromIntegral digitCount * 4.5 + 3 :: Double
in (fxCenter + offset, "start")
Nothing -> (fxCenter, "middle")
in if T.null (T.pack glyph) in if T.null (T.pack glyph)
then "" then ""
else T.concat else T.concat
[ "<text x=\"", ff (fxCenter + offset) [ "<text x=\"", ff x
, "\" y=\"", ff (fyCenter + 4) , "\" y=\"", ff (fyCenter + 4)
, "\" text-anchor=\"start\"" , "\" text-anchor=\"", anchor, "\""
, " fill=\"currentColor\" stroke=\"none\"" , " fill=\"currentColor\" stroke=\"none\""
, " font-family=\"Spectral, serif\" font-size=\"16\">" , " font-family=\"Spectral, serif\" font-size=\"16\">"
, T.pack glyph , T.pack glyph

View File

@ -12,6 +12,7 @@ module Pagination
) where ) where
import Hakyll import Hakyll
import Patterns (blogPattern)
-- | Items per page across most paginated lists (e.g. the blog). -- | Items per page across most paginated lists (e.g. the blog).
@ -39,7 +40,7 @@ blogPageId n = fromFilePath $ "blog/page/" ++ show n ++ "/index.html"
-- @baseCtx@: site-level context (siteCtx). -- @baseCtx@: site-level context (siteCtx).
blogPaginateRules :: Context String -> Context String -> Rules () blogPaginateRules :: Context String -> Context String -> Rules ()
blogPaginateRules itemCtx baseCtx = do blogPaginateRules itemCtx baseCtx = do
paginate <- buildPaginateWith sortAndGroup ("content/blog/*.md" .&&. hasNoVersion) blogPageId paginate <- buildPaginateWith sortAndGroup (blogPattern .&&. hasNoVersion) blogPageId
paginateRules paginate $ \pageNum pat -> do paginateRules paginate $ \pageNum pat -> do
route idRoute route idRoute
compile $ do compile $ do

View File

@ -122,7 +122,14 @@ allWritings :: Pattern
allWritings = essayPattern .||. blogPattern .||. poetryPattern .||. fictionPattern allWritings = essayPattern .||. blogPattern .||. poetryPattern .||. fictionPattern
-- | Every content file the backlinks pass should index. Includes music -- | Every content file the backlinks pass should index. Includes music
-- landing pages and top-level standalone pages, in addition to writings. -- landing pages and top-level standalone pages, in addition to writings,
-- plus the two directory-form standalone essays (@content/me/index.md@
-- and @content/memento-mori/index.md@) — full essays rendered with
-- backlinks, whose outgoing links must be visible to the link graph.
--
-- Photography is deliberately excluded: photo pages do not render the
-- backlinks block (see 'Contexts.photographyCtx'), and caption-scale
-- entries would add link-graph noise with no consuming surface.
allContent :: Pattern allContent :: Pattern
allContent = allContent =
essayPattern essayPattern
@ -131,6 +138,8 @@ allContent =
.||. fictionPattern .||. fictionPattern
.||. musicPattern .||. musicPattern
.||. standalonePagesPattern .||. standalonePagesPattern
.||. "content/me/index.md"
.||. "content/memento-mori/index.md"
-- | Content shown on author index pages — essays + blog posts. -- | Content shown on author index pages — essays + blog posts.
-- (Poetry and fiction have their own dedicated indexes and are not -- (Poetry and fiction have their own dedicated indexes and are not

View File

@ -27,7 +27,7 @@ import Data.Maybe (mapMaybe, fromMaybe, catMaybes)
import qualified Data.Set as Set import qualified Data.Set as Set
import Data.Set (Set) import Data.Set (Set)
import Data.Ord (Down (..), comparing) import Data.Ord (Down (..), comparing)
import System.FilePath (takeDirectory, takeFileName, replaceExtension) import System.FilePath (takeBaseName, takeDirectory, takeFileName, replaceExtension)
import qualified Data.Aeson as Aeson import qualified Data.Aeson as Aeson
import Data.Aeson (Value (..), (.=)) import Data.Aeson (Value (..), (.=))
import qualified Data.Aeson.KeyMap as KM import qualified Data.Aeson.KeyMap as KM
@ -305,10 +305,11 @@ stripIndexHtml r
-- * @exact@: 4 decimal places (~10 m) -- * @exact@: 4 decimal places (~10 m)
-- * @km@ : 2 decimal places (~1 km) -- * @km@ : 2 decimal places (~1 km)
-- * @city@ : 1 decimal place (~10 km) — default -- * @city@ : 1 decimal place (~10 km) — default
-- * other : treated as @city@ -- * other : treated as @city@ (defensive only — 'buildPin' validates
-- the precision and fails closed before consulting this function)
-- --
-- @hidden@ is handled at the call site by skipping the pin entirely; -- @hidden@ and unrecognised values are handled at the call site by
-- this function is not consulted in that case. -- skipping the pin entirely; this function is not consulted then.
roundCoord :: String -> Double -> Double roundCoord :: String -> Double -> Double
roundCoord prec x = roundCoord prec x =
let n = case prec of let n = case prec of
@ -336,7 +337,10 @@ parseGeo meta = case KM.lookup "geo" meta of
-- | Build a single pin object from a photo entry. Returns 'Nothing' -- | Build a single pin object from a photo entry. Returns 'Nothing'
-- when: -- when:
-- * the entry has no @geo:@ frontmatter, or -- * the entry has no @geo:@ frontmatter, or
-- * it has @geo-precision: hidden@, or -- * @geo-precision:@ is anything other than @exact@/@km@/@city@ —
-- @hidden@ and unrecognised values (typos, wrong case) alike.
-- Failing closed means a typo'd \"hidden\" can never publish
-- coordinates the author meant to suppress.
-- * the entry has no resolvable route (shouldn't happen for -- * the entry has no resolvable route (shouldn't happen for
-- photographyPattern items, but be defensive). -- photographyPattern items, but be defensive).
buildPin :: Item String -> Compiler (Maybe Value) buildPin :: Item String -> Compiler (Maybe Value)
@ -345,13 +349,21 @@ buildPin item = do
meta <- getMetadata ident meta <- getMetadata ident
mRoute <- getRoute ident mRoute <- getRoute ident
case (parseGeo meta, lookupString "geo-precision" meta, mRoute) of case (parseGeo meta, lookupString "geo-precision" meta, mRoute) of
(_, Just "hidden", _) -> return Nothing (Just (lat, lon), prec, Just r)
(Just (lat, lon), prec, Just r) -> | maybe True (`elem` ["exact", "km", "city"]) prec ->
let prec' = fromMaybe "city" prec let prec' = fromMaybe "city" prec
rLat = roundCoord prec' lat rLat = roundCoord prec' lat
rLon = roundCoord prec' lon rLon = roundCoord prec' lon
fp = toFilePath ident fp = toFilePath ident
slug = takeFileName (takeDirectory fp) -- Directory entries (<slug>/index.md) and series children
-- (<series>/<photo>.md) both key assets off the parent
-- directory; a flat single (content/photography/foo.md)
-- has no entry directory, so its slug is its basename and
-- its co-located assets route to /photography/ directly.
isFlat = takeDirectory fp == "content/photography"
&& takeFileName fp /= "index.md"
slug = if isFlat then takeBaseName fp
else takeFileName (takeDirectory fp)
title = fromMaybe slug (lookupString "title" meta) title = fromMaybe slug (lookupString "title" meta)
photo = lookupString "photo" meta photo = lookupString "photo" meta
-- Trim trailing "index.html" so the click-through URL -- Trim trailing "index.html" so the click-through URL
@ -359,7 +371,8 @@ buildPin item = do
url = "/" ++ stripIndexHtml r url = "/" ++ stripIndexHtml r
thumb = case photo of thumb = case photo of
Just p | not (null p) -> Just p | not (null p) ->
"/photography/" ++ slug ++ "/" ++ p if isFlat then "/photography/" ++ p
else "/photography/" ++ slug ++ "/" ++ p
_ -> "" _ -> ""
captured = lookupString "captured" meta captured = lookupString "captured" meta
in return $ Just $ Aeson.object $ in return $ Just $ Aeson.object $
@ -443,13 +456,20 @@ photographyFeedDescription = field "description" $ \item -> do
body <- itemBody <$> (loadSnapshot ident "content" :: Compiler (Item String)) body <- itemBody <$> (loadSnapshot ident "content" :: Compiler (Item String))
meta <- getMetadata ident meta <- getMetadata ident
let fp = toFilePath ident let fp = toFilePath ident
isDir = takeFileName fp == "index.md" -- Same asset-path derivation as 'buildPin': directory entries
-- (<slug>/index.md) and series children (<series>/<photo>.md)
-- both key assets off the parent directory; a flat single
-- (content/photography/foo.md) has no entry directory, so its
-- co-located assets route to /photography/ directly.
isFlat = takeDirectory fp == "content/photography"
&& takeFileName fp /= "index.md"
slug = takeFileName (takeDirectory fp) slug = takeFileName (takeDirectory fp)
photo = lookupString "photo" meta imgTag = case lookupString "photo" meta of
imgTag = case (isDir, photo) of Just p | not (null p) ->
(True, Just p) | not (null p) -> let src = if isFlat then "/photography/" ++ p
"<p><img src=\"https://levineuwirth.org/photography/" else "/photography/" ++ slug ++ "/" ++ p
++ slug ++ "/" ++ p ++ "\" alt=\"\"></p>\n" in "<p><img src=\"https://levineuwirth.org"
++ src ++ "\" alt=\"\"></p>\n"
_ -> "" _ -> ""
return (imgTag ++ body) return (imgTag ++ body)

View File

@ -49,7 +49,8 @@ instance Aeson.FromJSON SimilarEntry where
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- | Maximum entries rendered in the "Related" block. The on-disk JSON may -- | Maximum entries rendered in the "Related" block. The on-disk JSON may
-- contain more (embed.py's TOP_N); the template caps the display. -- contain more (embed.py's TOP_N); 'similarLinksField' caps the list
-- (@take maxSimilar@) before rendering.
maxSimilar :: Int maxSimilar :: Int
maxSimilar = 3 maxSimilar = 3
@ -101,10 +102,10 @@ normaliseUrl url =
-- | Percent-decode @%XX@ escapes (UTF-8) so percent-encoded paths -- | Percent-decode @%XX@ escapes (UTF-8) so percent-encoded paths
-- collide with their decoded form on map lookup. Mirrors -- collide with their decoded form on map lookup. Mirrors
-- 'Backlinks.percentDecode'; the two implementations are intentionally -- 'Backlinks.percentDecode' (and 'Backlinks.normaliseUrl' now applies
-- duplicated because they apply different normalisations *before* -- the same strip-@index.html@-then-@.html@ normalisation as this
-- decoding (Backlinks strips @.html@ unconditionally; SimilarLinks -- module); the duplication keeps the two modules dependency-free of
-- preserves the trailing-slash form for index pages). -- each other.
percentDecode :: String -> String percentDecode :: String -> String
percentDecode = T.unpack . TE.decodeUtf8With TE.lenientDecode . BS.pack . go percentDecode = T.unpack . TE.decodeUtf8With TE.lenientDecode . BS.pack . go
where where
@ -121,6 +122,25 @@ percentDecode = T.unpack . TE.decodeUtf8With TE.lenientDecode . BS.pack . go
| c >= 'A' && c <= 'F' = Just (fromEnum c - fromEnum 'A' + 10) | c >= 'A' && c <= 'F' = Just (fromEnum c - fromEnum 'A' + 10)
| otherwise = Nothing | otherwise = Nothing
-- | Percent-encode a string for use as a URI query value: RFC 3986
-- unreserved characters pass through; everything else — including @&@,
-- @?@, @#@, spaces, and non-ASCII text via its UTF-8 bytes — becomes
-- @%XX@. Hand-rolled (the moral equivalent of network-uri's
-- @escapeURIString isUnreserved@) because network-uri is not otherwise
-- a dependency. The output is also HTML-attribute-safe: it contains
-- only unreserved characters and @%XX@ escapes.
percentEncode :: String -> String
percentEncode = concatMap enc . BS.unpack . TE.encodeUtf8 . T.pack
where
enc b
| unreserved b = [toEnum (fromIntegral b)]
| otherwise = ['%', hexDigit (b `div` 16), hexDigit (b `mod` 16)]
unreserved b =
let c = toEnum (fromIntegral b) :: Char
in (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9') || c `elem` ("-._~" :: String)
hexDigit n = "0123456789ABCDEF" !! fromIntegral n
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- HTML rendering -- HTML rendering
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
@ -153,8 +173,14 @@ renderSimilarLinks entries =
++ "</a></li>\n" ++ "</a></li>\n"
renderPdf se = renderPdf se =
let raw = seUrl se -- The PDF path becomes the @file=@ query value, so it must be
viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw -- percent-encoded (HTML escaping alone leaves @&@/@?@/@#@/spaces
-- free to break the query). A @#page=N@ fragment stays a fragment
-- of the viewer URL itself — PDF.js reads it from location.hash.
let raw = seUrl se
(path, frag) = break (== '#') raw
viewerUrl = "/pdfjs/web/viewer.html?file="
++ percentEncode path ++ escapeHtml frag
in "<li class=\"similar-links-item\">" in "<li class=\"similar-links-item\">"
++ "<a class=\"similar-link pdf-link\"" ++ "<a class=\"similar-link pdf-link\""
++ " href=\"" ++ viewerUrl ++ "\"" ++ " href=\"" ++ viewerUrl ++ "\""

View File

@ -31,7 +31,7 @@ import Commonplace (commonplaceCtx)
import Now (nowCtx) import Now (nowCtx)
import Contexts (siteCtx, essayCtx, postCtx, pageCtx, poetryCtx, fictionCtx, compositionCtx, import Contexts (siteCtx, essayCtx, postCtx, pageCtx, poetryCtx, fictionCtx, compositionCtx,
contentKindField, recentFirstByDisplay, contentKindField, recentFirstByDisplay,
tagLinksFieldExcludingTopSegment) tagLinksFieldExcludingTopSegment, isProvedConfidence)
import qualified Patterns as P import qualified Patterns as P
import Photography (photographyRules) import Photography (photographyRules)
import Tags (buildAllTags, applyTagRules, sidecarIdentifier, import Tags (buildAllTags, applyTagRules, sidecarIdentifier,
@ -40,7 +40,7 @@ import Pagination (blogPaginateRules)
import Stats (statsRules) import Stats (statsRules)
-- | Home-page portal grid order. Canonical ordering authority for every -- | Home-page portal grid order. Canonical ordering authority for every
-- rendering of the eight portals (currently: the home page; future -- rendering of the portals (currently: the home page; future
-- consumers follow this list). Each entry is (display name, tag name); -- consumers follow this list). Each entry is (display name, tag name);
-- the tag name is the key to everything else — URL (@/\<tag\>/@), -- the tag name is the key to everything else — URL (@/\<tag\>/@),
-- sidecar path (@content\/tag-meta\/\<tag\>.md@), and the Tags.hs -- sidecar path (@content\/tag-meta\/\<tag\>.md@), and the Tags.hs
@ -73,13 +73,17 @@ libraryShelfMax = 5
libraryIntroId :: Identifier libraryIntroId :: Identifier
libraryIntroId = fromFilePath "content/library.md" libraryIntroId = fromFilePath "content/library.md"
-- Poems inside collection subdirectories, excluding their index pages. -- | Route that strips a literal prefix from the identifier's path.
collectionPoems :: Pattern -- Hakyll's 'gsubRoute' replaces /every/ occurrence of its pattern, so
collectionPoems = "content/poetry/*/*.md" .&&. complement "content/poetry/*/index.md" -- @gsubRoute "content/"@ would also mangle a co-located directory that
-- happened to be named @content@ deeper in the path
-- All poetry content (flat + collection), excluding collection index pages. -- (@content/essays/slug/content/data.csv@ → @essays/slug/data.csv@).
allPoetry :: Pattern -- This touches only the leading occurrence; identifiers that don't
allPoetry = "content/poetry/*.md" .||. collectionPoems -- start with the prefix pass through unchanged.
stripPrefixRoute :: String -> Routes
stripPrefixRoute prefix = customRoute $ \ident ->
let fp = toFilePath ident
in fromMaybe fp (stripPrefix prefix fp)
feedConfig :: FeedConfiguration feedConfig :: FeedConfiguration
feedConfig = FeedConfiguration feedConfig = FeedConfiguration
@ -168,18 +172,18 @@ rules = do
-- Per-page JS files — authored alongside content in content/**/*.js. -- Per-page JS files — authored alongside content in content/**/*.js.
-- Draft JS is handled by a separate dev-only rule below. -- Draft JS is handled by a separate dev-only rule below.
match ("content/**/*.js" .&&. complement "content/drafts/**") $ do match ("content/**/*.js" .&&. complement "content/drafts/**") $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
-- Per-page JS co-located with draft essays (dev-only). -- Per-page JS co-located with draft essays (dev-only).
when isDev $ match "content/drafts/**/*.js" $ do when isDev $ match "content/drafts/**/*.js" $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
-- CSS — must be matched before the broad static/** rule to avoid -- CSS — must be matched before the broad static/** rule to avoid
-- double-matching (compressCssCompiler vs. copyFileCompiler). -- double-matching (compressCssCompiler vs. copyFileCompiler).
match "static/css/*" $ do match "static/css/*" $ do
route $ gsubRoute "static/" (const "") route $ stripPrefixRoute "static/"
compile compressCssCompiler compile compressCssCompiler
-- All other static files (fonts, JS, images, …). Build-time -- All other static files (fonts, JS, images, …). Build-time
@ -192,7 +196,7 @@ rules = do
.&&. complement "static/**/*.exif.yaml" .&&. complement "static/**/*.exif.yaml"
.&&. complement "static/**/*.palette.yaml" .&&. complement "static/**/*.palette.yaml"
) $ do ) $ do
route $ gsubRoute "static/" (const "") route $ stripPrefixRoute "static/"
compile copyFileCompiler compile copyFileCompiler
-- Templates -- Templates
@ -299,7 +303,7 @@ rules = do
-- SVG score fragments co-located with me/index.md. -- SVG score fragments co-located with me/index.md.
match "content/me/scores/*.svg" $ do match "content/me/scores/*.svg" $ do
route $ gsubRoute "content/me/" (const "") route $ stripPrefixRoute "content/me/"
compile copyFileCompiler compile copyFileCompiler
-- memento-mori/index.md — lives in its own directory so co-located SVG -- memento-mori/index.md — lives in its own directory so co-located SVG
@ -315,7 +319,7 @@ rules = do
-- SVG score fragments co-located with memento-mori/index.md. -- SVG score fragments co-located with memento-mori/index.md.
match "content/memento-mori/scores/*.svg" $ do match "content/memento-mori/scores/*.svg" $ do
route $ gsubRoute "content/memento-mori/" (const "") route $ stripPrefixRoute "content/memento-mori/"
compile copyFileCompiler compile copyFileCompiler
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
@ -354,7 +358,7 @@ rules = do
.&&. complement "content/colophon.md" .&&. complement "content/colophon.md"
.&&. complement "content/current.md" .&&. complement "content/current.md"
.&&. complement "content/library.md") $ do .&&. complement "content/library.md") $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ pageCompiler compile $ pageCompiler
>>= loadAndApplyTemplate "templates/page.html" pageCtx >>= loadAndApplyTemplate "templates/page.html" pageCtx
@ -414,7 +418,7 @@ rules = do
.&&. complement "content/essays/*.md" .&&. complement "content/essays/*.md"
.&&. complement "content/essays/*/index.md" .&&. complement "content/essays/*/index.md"
.&&. complement "content/essays/**/*.dims.yaml") $ do .&&. complement "content/essays/**/*.dims.yaml") $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
-- Static assets co-located with draft essays (dev-only). -- Static assets co-located with draft essays (dev-only).
@ -422,14 +426,14 @@ rules = do
.&&. complement "content/drafts/essays/*.md" .&&. complement "content/drafts/essays/*.md"
.&&. complement "content/drafts/essays/*/index.md" .&&. complement "content/drafts/essays/*/index.md"
.&&. complement "content/drafts/essays/**/*.dims.yaml") $ do .&&. complement "content/drafts/essays/**/*.dims.yaml") $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Blog posts -- Blog posts
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
match "content/blog/*.md" $ do match "content/blog/*.md" $ do
route $ gsubRoute "content/blog/" (const "blog/") route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ postCompiler compile $ postCompiler
>>= saveSnapshot "content" >>= saveSnapshot "content"
@ -440,19 +444,12 @@ rules = do
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Poetry -- Poetry
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Flat poems (e.g. content/poetry/sonnet-60.md) -- All poems — flat (content/poetry/sonnet-60.md) and collection
match "content/poetry/*.md" $ do -- (content/poetry/shakespeare-sonnets/sonnet-1.md) forms share one
route $ gsubRoute "content/poetry/" (const "poetry/") -- rule; collection index pages are excluded by 'P.poetryPattern'
`composeRoutes` setExtension "html" -- itself and matched separately below.
compile $ poetryCompiler match P.poetryPattern $ do
>>= saveSnapshot "content" route $ stripPrefixRoute "content/"
>>= loadAndApplyTemplate "templates/reading.html" poetryCtx
>>= loadAndApplyTemplate "templates/default.html" poetryCtx
>>= relativizeUrls
-- Collection poems (e.g. content/poetry/shakespeare-sonnets/sonnet-1.md)
match collectionPoems $ do
route $ gsubRoute "content/poetry/" (const "poetry/")
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ poetryCompiler compile $ poetryCompiler
>>= saveSnapshot "content" >>= saveSnapshot "content"
@ -462,7 +459,7 @@ rules = do
-- Collection index pages (e.g. content/poetry/shakespeare-sonnets/index.md) -- Collection index pages (e.g. content/poetry/shakespeare-sonnets/index.md)
match "content/poetry/*/index.md" $ do match "content/poetry/*/index.md" $ do
route $ gsubRoute "content/poetry/" (const "poetry/") route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ pageCompiler compile $ pageCompiler
>>= loadAndApplyTemplate "templates/default.html" pageCtx >>= loadAndApplyTemplate "templates/default.html" pageCtx
@ -472,7 +469,7 @@ rules = do
-- Fiction -- Fiction
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
match "content/fiction/*.md" $ do match "content/fiction/*.md" $ do
route $ gsubRoute "content/fiction/" (const "fiction/") route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ fictionCompiler compile $ fictionCompiler
>>= saveSnapshot "content" >>= saveSnapshot "content"
@ -496,20 +493,20 @@ rules = do
-- Static assets (SVG score pages, audio, PDF) served unchanged. -- Static assets (SVG score pages, audio, PDF) served unchanged.
match "content/music/**/*.svg" $ do match "content/music/**/*.svg" $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
match "content/music/**/*.mp3" $ do match "content/music/**/*.mp3" $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
match "content/music/**/*.pdf" $ do match "content/music/**/*.pdf" $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
compile copyFileCompiler compile copyFileCompiler
-- Landing page — full essay pipeline. -- Landing page — full essay pipeline.
match "content/music/*/index.md" $ do match "content/music/*/index.md" $ do
route $ gsubRoute "content/" (const "") route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html" `composeRoutes` setExtension "html"
compile $ compositionCompiler compile $ compositionCompiler
>>= saveSnapshot "content" >>= saveSnapshot "content"
@ -566,6 +563,46 @@ rules = do
>>= loadAndApplyTemplate "templates/default.html" ctx >>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls >>= relativizeUrls
-- ---------------------------------------------------------------------------
-- Poetry index
-- ---------------------------------------------------------------------------
-- Nav, the home portal grid, and the library all link /poetry/; this
-- rule is what keeps those links from 404ing. Lists flat poems and
-- collection poems alike; collection index pages are excluded by
-- 'P.poetryPattern' itself.
create ["poetry/index.html"] $ do
route idRoute
compile $ do
poems <- recentFirst =<< loadAll (P.poetryPattern .&&. hasNoVersion)
let ctx =
listField "essays" poetryCtx (return poems)
<> constField "title" "Poetry"
<> constField "portal" "true"
<> siteCtx
makeItem ""
>>= loadAndApplyTemplate "templates/essay-index.html" ctx
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- ---------------------------------------------------------------------------
-- Fiction index
-- ---------------------------------------------------------------------------
-- Same rationale as the poetry index. content/fiction/ has no entries
-- yet; an empty match list renders an empty index rather than a 404.
create ["fiction/index.html"] $ do
route idRoute
compile $ do
stories <- recentFirst =<< loadAll (P.fictionPattern .&&. hasNoVersion)
let ctx =
listField "essays" fictionCtx (return stories)
<> constField "title" "Fiction"
<> constField "portal" "true"
<> siteCtx
makeItem ""
>>= loadAndApplyTemplate "templates/essay-index.html" ctx
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- New page — all content sorted by creation date, newest first -- New page — all content sorted by creation date, newest first
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
@ -573,10 +610,10 @@ rules = do
route idRoute route idRoute
compile $ do compile $ do
let allContent = ( allEssays let allContent = ( allEssays
.||. "content/blog/*.md" .||. P.blogPattern
.||. "content/fiction/*.md" .||. P.fictionPattern
.||. allPoetry .||. P.poetryPattern
.||. "content/music/*/index.md" .||. P.musicPattern
) .&&. hasNoVersion ) .&&. hasNoVersion
items <- recentFirstByDisplay =<< loadAll allContent items <- recentFirstByDisplay =<< loadAll allContent
let itemCtx = contentKindField let itemCtx = contentKindField
@ -601,7 +638,7 @@ rules = do
-- Library — portal-grouped view over the /new.html dataset, deduplicated -- Library — portal-grouped view over the /new.html dataset, deduplicated
-- by primary portal. An item's primary portal is the top segment of the -- by primary portal. An item's primary portal is the top segment of the
-- first tag in its frontmatter 'tags:' list whose top segment matches a -- first tag in its frontmatter 'tags:' list whose top segment matches a
-- known portal (the eight in 'homePortals'). Items with no such tag are -- known portal (those in 'homePortals'). Items with no such tag are
-- silently dropped from the library (they remain on /new.html and on any -- silently dropped from the library (they remain on /new.html and on any
-- tag pages their frontmatter produces). -- tag pages their frontmatter produces).
-- --
@ -629,9 +666,11 @@ rules = do
-- Top segment of the first tag that names a known portal. -- Top segment of the first tag that names a known portal.
-- Nothing when no tag matches — item is excluded from library. -- Nothing when no tag matches — item is excluded from library.
-- Reads tags via 'getTags' (not lookupStringList) so the
-- scalar comma form ("tags: research, ai") is accepted with
-- the same semantics the tag pages use.
primaryPortalOf item = do primaryPortalOf item = do
meta <- getMetadata (itemIdentifier item) ts <- getTags (itemIdentifier item)
let ts = fromMaybe [] (lookupStringList "tags" meta)
return $ listToMaybe return $ listToMaybe
[ p | t <- ts [ p | t <- ts
, let p = takeWhile (/= '/') t , let p = takeWhile (/= '/') t
@ -654,13 +693,13 @@ rules = do
-- Load every content item once, then partition by primary portal -- Load every content item once, then partition by primary portal
-- so each shelf draws from a pre-filtered list rather than -- so each shelf draws from a pre-filtered list rather than
-- re-scanning the whole corpus nine times. -- re-scanning the whole corpus once per portal.
essays <- loadAll (allEssays .&&. hasNoVersion) essays <- loadAll (allEssays .&&. hasNoVersion)
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion) posts <- loadAll (P.blogPattern .&&. hasNoVersion)
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion) fiction <- loadAll (P.fictionPattern .&&. hasNoVersion)
poetry <- loadAll (allPoetry .&&. hasNoVersion) poetry <- loadAll (P.poetryPattern .&&. hasNoVersion)
music <- loadAll ("content/music/*/index.md" .&&. hasNoVersion) music <- loadAll (P.musicPattern .&&. hasNoVersion)
photos <- loadAll (P.photographyPattern .&&. hasNoVersion) photos <- loadAll (P.photographyPattern .&&. hasNoVersion)
let allContent = essays ++ posts ++ fiction ++ poetry ++ music ++ photos let allContent = essays ++ posts ++ fiction ++ poetry ++ music ++ photos
:: [Item String] :: [Item String]
tagged <- mapM (\i -> (,i) <$> primaryPortalOf i) allContent tagged <- mapM (\i -> (,i) <$> primaryPortalOf i) allContent
@ -668,21 +707,30 @@ rules = do
itemsByPortal = itemsByPortal =
Map.fromListWith (++) [(p, [i]) | (Just p, i) <- tagged] Map.fromListWith (++) [(p, [i]) | (Just p, i) <- tagged]
-- Eager snapshot load registers the library-intro dependency -- Existence-guarded, like the sidecar contexts in Tags.hs:
-- unconditionally, so a first-populate of content/library.md -- deleting content/library.md degrades to a library page with
-- re-renders the library page even when the gate was previously -- no intro block rather than failing the whole compile. When
-- false (see 'sidecarContext' in Tags.hs for the same pattern). -- the file exists, the eager snapshot load registers the
_ <- loadSnapshot libraryIntroId "body" :: Compiler (Item String) -- library-intro dependency unconditionally, so a first-populate
let libraryIntroFld = field "library-intro" $ \_ -> do -- of content/library.md re-renders the library page even when
html <- itemBody <$> loadSnapshot libraryIntroId "body" -- the gate was previously false (see 'sidecarContext' in
if all isSpace html -- Tags.hs for the same pattern).
then noResult "empty library intro" introIds <- getMatches "content/library.md"
else return html libraryIntroFld <-
if libraryIntroId `elem` introIds
then do
_ <- loadSnapshot libraryIntroId "body" :: Compiler (Item String)
return $ field "library-intro" $ \_ -> do
html <- itemBody <$> loadSnapshot libraryIntroId "body"
if all isSpace html
then noResult "empty library intro"
else return html
else return mempty
-- One shelf's context contribution: the @<slug>-entries@ -- One shelf's context contribution: the @<slug>-entries@
-- listField (or absent via noResult when the shelf is -- listField (or absent via noResult when the shelf is
-- empty) plus an optional @<slug>-has-more@ gate. -- empty) plus an optional @<slug>-has-more@ gate.
portalSection p = do let portalSection p = do
let portalItems = fromMaybe [] (Map.lookup p itemsByPortal) let portalItems = fromMaybe [] (Map.lookup p itemsByPortal)
sorted <- recentFirstByDisplay portalItems sorted <- recentFirstByDisplay portalItems
@ -763,10 +811,10 @@ rules = do
bibKwMap = invertKeywordsBib bibExtrasAll bibKwMap = invertKeywordsBib bibExtrasAll
writingIds <- getMatches $ (P.essayPattern writingIds <- getMatches $ (P.essayPattern
.||. "content/blog/*.md" .||. P.blogPattern
.||. "content/fiction/*.md" .||. P.fictionPattern
.||. P.poetryPattern .||. P.poetryPattern
.||. "content/music/*/index.md") .||. P.musicPattern)
.&&. hasNoVersion .&&. hasNoVersion
writingKwPairs <- forM writingIds $ \ident -> do writingKwPairs <- forM writingIds $ \ident -> do
@ -863,15 +911,17 @@ rules = do
>>= relativizeUrls >>= relativizeUrls
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Random page manifest — essays + blog posts only (no pagination/index pages) -- Random page manifest — essays, blog posts, fiction, and poetry (flat
-- and collection poems alike). No pagination/index pages; music and
-- photography landings are also excluded.
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
create ["random-pages.json"] $ do create ["random-pages.json"] $ do
route idRoute route idRoute
compile $ do compile $ do
essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String] essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion) :: Compiler [Item String] posts <- loadAll (P.blogPattern .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion) :: Compiler [Item String] fiction <- loadAll (P.fictionPattern .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll ("content/poetry/*.md" .&&. hasNoVersion) :: Compiler [Item String] poetry <- loadAll (P.poetryPattern .&&. hasNoVersion) :: Compiler [Item String]
routes <- mapM (getRoute . itemIdentifier) (essays ++ posts ++ fiction ++ poetry) routes <- mapM (getRoute . itemIdentifier) (essays ++ posts ++ fiction ++ poetry)
let urls = [ "/" ++ r | Just r <- routes ] let urls = [ "/" ++ r | Just r <- routes ]
makeItem $ LBS.unpack (Aeson.encode urls) makeItem $ LBS.unpack (Aeson.encode urls)
@ -884,11 +934,11 @@ rules = do
create ["data/epistemic-meta.json"] $ do create ["data/epistemic-meta.json"] $ do
route idRoute route idRoute
compile $ do compile $ do
essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String] essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion) :: Compiler [Item String] posts <- loadAll (P.blogPattern .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion) :: Compiler [Item String] fiction <- loadAll (P.fictionPattern .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll (allPoetry .&&. hasNoVersion) :: Compiler [Item String] poetry <- loadAll (P.poetryPattern .&&. hasNoVersion) :: Compiler [Item String]
music <- loadAll ("content/music/*/index.md" .&&. hasNoVersion) :: Compiler [Item String] music <- loadAll (P.musicPattern .&&. hasNoVersion) :: Compiler [Item String]
let items = essays ++ posts ++ fiction ++ poetry ++ music let items = essays ++ posts ++ fiction ++ poetry ++ music
pairs <- mapM epistemicEntry items pairs <- mapM epistemicEntry items
let metaMap = Map.fromList (catMaybes pairs) let metaMap = Map.fromList (catMaybes pairs)
@ -903,10 +953,10 @@ rules = do
posts <- fmap (take 30) . recentFirst posts <- fmap (take 30) . recentFirst
=<< loadAllSnapshots =<< loadAllSnapshots
( ( allEssays ( ( allEssays
.||. "content/blog/*.md" .||. P.blogPattern
.||. "content/fiction/*.md" .||. P.fictionPattern
.||. allPoetry .||. P.poetryPattern
.||. "content/music/*/index.md" .||. P.musicPattern
) )
.&&. hasNoVersion .&&. hasNoVersion
) )
@ -926,7 +976,7 @@ rules = do
compile $ do compile $ do
compositions <- recentFirst compositions <- recentFirst
=<< loadAllSnapshots =<< loadAllSnapshots
("content/music/*/index.md" .&&. hasNoVersion) (P.musicPattern .&&. hasNoVersion)
"content" "content"
let feedCtx = let feedCtx =
dateField "updated" "%Y-%m-%dT%H:%M:%SZ" dateField "updated" "%Y-%m-%dT%H:%M:%SZ"
@ -966,10 +1016,10 @@ rules = do
entries <- recentFirst entries <- recentFirst
=<< loadAllSnapshots =<< loadAllSnapshots
( ( allEssays ( ( allEssays
.||. "content/blog/*.md" .||. P.blogPattern
.||. "content/fiction/*.md" .||. P.fictionPattern
.||. allPoetry .||. P.poetryPattern
.||. "content/music/*/index.md" .||. P.musicPattern
) )
.&&. hasNoVersion .&&. hasNoVersion
) )
@ -1011,8 +1061,12 @@ epistemicEntry item = do
, grab "stability" meta , grab "stability" meta
] ]
obj = Map.fromList fields obj = Map.fromList fields
-- Compute overall-score the same way Contexts.overallScoreField does. -- Compute overall-score the same way Contexts.overallScoreField
obj' = case ( readMaybe =<< lookupString "confidence" meta :: Maybe Int -- does, including the "proved"/"proven" sentinel -> 100.
confRaw = lookupString "confidence" meta
confInt | isProvedConfidence confRaw = Just 100
| otherwise = readMaybe =<< confRaw :: Maybe Int
obj' = case ( confInt
, readMaybe =<< lookupString "evidence" meta :: Maybe Int , readMaybe =<< lookupString "evidence" meta :: Maybe Int
) of ) of
(Just conf, Just ev) -> (Just conf, Just ev) ->

View File

@ -33,8 +33,11 @@ import Control.Exception (catch, IOException)
import Data.Aeson (Value (..)) import Data.Aeson (Value (..))
import qualified Data.Aeson.KeyMap as KM import qualified Data.Aeson.KeyMap as KM
import qualified Data.Vector as V import qualified Data.Vector as V
import Data.List (sortBy)
import Data.Maybe (catMaybes, fromMaybe, listToMaybe) import Data.Maybe (catMaybes, fromMaybe, listToMaybe)
import Data.Ord (comparing, Down (..))
import Data.Time.Calendar (Day, diffDays) import Data.Time.Calendar (Day, diffDays)
import Data.Time.Clock (getCurrentTime, utctDay)
import Data.Time.Format (parseTimeM, formatTime, defaultTimeLocale) import Data.Time.Format (parseTimeM, formatTime, defaultTimeLocale)
import qualified Data.Text as T import qualified Data.Text as T
import qualified Data.Text.IO as TIO import qualified Data.Text.IO as TIO
@ -85,14 +88,8 @@ gitDates fp = do
parseIso :: String -> Maybe Day parseIso :: String -> Maybe Day
parseIso = parseTimeM True defaultTimeLocale "%Y-%m-%d" parseIso = parseTimeM True defaultTimeLocale "%Y-%m-%d"
-- | Approximate day-span between the oldest and newest ISO date strings. -- | Derive stability label from commit dates (newest-first), judged as
daySpan :: String -> String -> Int -- of @today@.
daySpan oldest newest =
case (parseIso oldest, parseIso newest) of
(Just o, Just n) -> fromIntegral (abs (diffDays n o))
_ -> 0
-- | Derive stability label from commit dates (newest-first).
-- --
-- Thresholds (commit count + age in days since first commit): -- Thresholds (commit count + age in days since first commit):
-- --
@ -104,13 +101,18 @@ daySpan oldest newest =
-- --
-- These cliffs are deliberately conservative: a fast burst of commits -- These cliffs are deliberately conservative: a fast burst of commits
-- early in a piece's life looks volatile until enough time has passed -- early in a piece's life looks volatile until enough time has passed
-- to demonstrate it has settled. -- to demonstrate it has settled. Age is measured from the first commit
stabilityFromDates :: [String] -> String -- to /today/, not to the most recent commit — a piece written in a
stabilityFromDates [] = "volatile" -- one-week burst must be able to stabilise as quiet time accumulates.
stabilityFromDates dates@(newest : _) = stabilityFromDates :: Day -> [String] -> String
-- 'last' is safe: the (newest:_) pattern guarantees non-empty. stabilityFromDates _ [] = "volatile"
classify (length dates) (daySpan (last dates) newest) stabilityFromDates today dates =
classify (length dates) ageDays
where where
-- 'last' is safe: the [] case is handled above.
ageDays = case parseIso (last dates) of
Just firstDay -> fromIntegral (diffDays today firstDay)
Nothing -> 0
classify n age classify n age
| n <= 1 || age < volatileAge = "volatile" | n <= 1 || age < volatileAge = "volatile"
| n <= 5 && age < revisingAge = "revising" | n <= 5 && age < revisingAge = "revising"
@ -149,7 +151,9 @@ resolveStability item = do
ignored <- readIgnore ignored <- readIgnore
if srcPath `elem` ignored if srcPath `elem` ignored
then return $ fromMaybe "volatile" (lookupString "stability" meta) then return $ fromMaybe "volatile" (lookupString "stability" meta)
else stabilityFromDates <$> gitDates srcPath else do
today <- utctDay <$> getCurrentTime
stabilityFromDates today <$> gitDates srcPath
-- | Context field @$stability$@. -- | Context field @$stability$@.
-- Always resolves to a label; prefers frontmatter when the file is pinned. -- Always resolves to a label; prefers frontmatter when the file is pinned.
@ -166,7 +170,9 @@ lastReviewedField = field "last-reviewed" $ \item -> do
mDate <- unsafeCompiler $ do mDate <- unsafeCompiler $ do
ignored <- readIgnore ignored <- readIgnore
if srcPath `elem` ignored if srcPath `elem` ignored
then return $ lookupString "last-reviewed" meta -- Frontmatter convention is ISO; format it like the git
-- branch so pinned pages don't render a raw "2026-05-01".
then return $ fmtIso <$> lookupString "last-reviewed" meta
else fmap fmtIso . listToMaybe <$> gitDates srcPath else fmap fmtIso . listToMaybe <$> gitDates srcPath
case mDate of case mDate of
Nothing -> fail "no last-reviewed" Nothing -> fail "no last-reviewed"
@ -228,14 +234,21 @@ versionHistoryHeadCount = 3
-- | Load version-history entries for an item. -- | Load version-history entries for an item.
-- Priority: frontmatter @history:@ list → git log dates → empty. -- Priority: frontmatter @history:@ list → git log dates → empty.
--
-- Entries are sorted newest-first by ISO date regardless of authored
-- order: every consumer (primary/rest split, range fields) assumes the
-- head is the newest entry, and the @history:@ list may be authored in
-- either direction. Git dates already arrive newest-first; the sort is
-- idempotent there.
loadVersionHistory :: Item a -> Compiler [VHEntry] loadVersionHistory :: Item a -> Compiler [VHEntry]
loadVersionHistory item = do loadVersionHistory item = do
let srcPath = toFilePath (itemIdentifier item) let srcPath = toFilePath (itemIdentifier item)
meta <- getMetadata (itemIdentifier item) meta <- getMetadata (itemIdentifier item)
let fmEntries = parseFmHistory meta let newestFirst = sortBy (comparing (Down . vhDateIso))
fmEntries = newestFirst (parseFmHistory meta)
if not (null fmEntries) if not (null fmEntries)
then return fmEntries then return fmEntries
else unsafeCompiler (gitLogHistory srcPath) else unsafeCompiler (newestFirst <$> gitLogHistory srcPath)
-- | Wrap a list of 'VHEntry' as Hakyll Items with unique paths so the -- | Wrap a list of 'VHEntry' as Hakyll Items with unique paths so the
-- list field works correctly inside @$for$@. -- list field works correctly inside @$for$@.

View File

@ -156,21 +156,35 @@ stripHtmlTags = go
skipApos (_:rs) = skipApos rs skipApos (_:rs) = skipApos rs
skipApos [] = [] skipApos [] = []
-- | Normalise a page URL for backlink map lookup (strip trailing .html). -- | Normalise a page URL for backlink map lookup. Must mirror
-- 'Backlinks.normaliseUrl': strip a trailing @index.html@ (keeping the
-- directory slash) before the bare @.html@ extension, so the keys this
-- produces match the keys written into @data/backlinks.json@.
normUrl :: String -> String normUrl :: String -> String
normUrl u normUrl u
| ".html" `isSuffixOf` u = take (length u - 5) u | "index.html" `isSuffixOf` u = take (length u - 10) u
| otherwise = u | ".html" `isSuffixOf` u = take (length u - 5) u
| otherwise = u
pad2 :: (Show a, Integral a) => a -> String pad2 :: (Show a, Integral a) => a -> String
pad2 n = if n < 10 then "0" ++ show n else show n pad2 n = if n < 10 then "0" ++ show n else show n
-- | Median of a non-empty list; returns 0 for empty. -- | Median of a non-empty list; returns 0 for empty. An even-length
-- list takes the mean of the two middle elements, rounded to the
-- nearest unit.
median :: [Int] -> Int median :: [Int] -> Int
median [] = 0 median [] = 0
median xs = sort xs !! (length xs `div` 2) median xs
-- Index is < length xs for non-empty xs, so '(!!)' is safe here | odd n = upper
-- by construction. The empty case is caught by the first equation. | otherwise = (lower + upper + 1) `div` 2
where
-- Indexes are in range for non-empty xs (lower is consulted only
-- when n >= 2), so '(!!)' is safe here by construction. The empty
-- case is caught by the first equation.
sorted = sort xs
n = length sorted
upper = sorted !! (n `div` 2)
lower = sorted !! (n `div` 2 - 1)
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
@ -181,8 +195,11 @@ parseDay :: String -> Maybe Day
parseDay = parseTimeM True defaultTimeLocale "%Y-%m-%d" parseDay = parseTimeM True defaultTimeLocale "%Y-%m-%d"
-- | First Monday on or before 'day' (start of its ISO week). -- | First Monday on or before 'day' (start of its ISO week).
-- 'fromEnum' on 'DayOfWeek' is ISO-numbered (Monday=1 .. Sunday=7),
-- so Monday must subtract 0 days, Sunday 6.
weekStart :: Day -> Day weekStart :: Day -> Day
weekStart day = addDays (fromIntegral (negate (fromEnum (dayOfWeek day)))) day weekStart day =
addDays (fromIntegral (negate (fromEnum (dayOfWeek day) - 1))) day
-- | Intensity class for the heatmap (hm0 … hm4). -- | Intensity class for the heatmap (hm0 … hm4).
heatClass :: Int -> String heatClass :: Int -> String
@ -297,7 +314,7 @@ renderHeatmap wordsByDay today =
nDays = diffDays today startDay + 1 nDays = diffDays today startDay + 1
allDays = [addDays i startDay | i <- [0 .. nDays - 1]] allDays = [addDays i startDay | i <- [0 .. nDays - 1]]
weekOf d = fromIntegral (diffDays d startDay `div` 7) :: Int weekOf d = fromIntegral (diffDays d startDay `div` 7) :: Int
dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6 dowOf d = fromEnum (dayOfWeek d) - 1 -- ISO 1..7 -> Mon=0..Sun=6
svgW = (nWeeks - 1) * step + cellSz svgW = (nWeeks - 1) * step + cellSz
svgH = 6 * step + cellSz + hdrH svgH = 6 * step + cellSz + hdrH
@ -752,7 +769,7 @@ renderArchive metrics =
dl [ (k, txt v) | (k, v) <- metrics ] dl [ (k, txt v) | (k, v) <- metrics ]
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
-- Static TOC (matches the nine h2 sections above) -- Static TOC (matches the eleven h2 sections above)
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
pageTOC :: H.Html pageTOC :: H.Html

View File

@ -30,16 +30,18 @@ module Tags
) where ) where
import Data.Char (isSpace) import Data.Char (isSpace)
import Data.List (intercalate, isPrefixOf, nub, sort) import Data.List (intercalate, isPrefixOf, nub, sort, sortBy)
import Data.Maybe (fromMaybe, isNothing, maybeToList) import Data.Maybe (fromMaybe, isNothing, maybeToList)
import Data.Ord (comparing)
import Data.Set (Set) import Data.Set (Set)
import qualified Data.Set as Set import qualified Data.Set as Set
import Data.Time.Clock (UTCTime)
import Data.Time.Format (defaultTimeLocale, parseTimeM)
import Hakyll import Hakyll
import Pagination (sortAndGroupAt)
import Patterns (tagIndexable) import Patterns (tagIndexable)
import Contexts (abstractField, contentKindField, import Contexts (Revision (..), abstractField, contentKindField,
recentFirstByDisplay, revisionDateFields, getRevisions, recentFirstByDisplay, revisionDateFields,
tagLinksFieldExcludingScope) siteCtx, tagLinksFieldExcludingScope)
-- --------------------------------------------------------------------------- -- ---------------------------------------------------------------------------
@ -80,23 +82,23 @@ expandTag t =
-- | Top-level tags that own a section URL outside the tag system, and -- | Top-level tags that own a section URL outside the tag system, and
-- therefore must NOT be created as tag pages — doing so would -- therefore must NOT be created as tag pages — doing so would
-- collide with a section landing route. The literal @"photography"@ -- collide with a section landing route. Hakyll does not error on
-- is the only one currently affected: every photo's @tags:@ list -- duplicate routes (one item silently overwrites the other), so an
-- begins with the bare @"photography"@ portal tag (per the section's -- essay tagged e.g. @music@ would otherwise clobber
-- convention), and 'tagIdentifier' would route that to -- @music/index.html@. The set therefore lists every namespace that
-- @"photography/index.html"@ — already owned by -- owns a @<name>/index.html@ route, not just the tags currently in
-- @photographyLandingRules@. -- use: @photography@ (every photo's @tags:@ list begins with it, per
-- the section convention) plus the other section landings and
-- generated index namespaces.
-- --
-- Sub-tags (@photography/landscape@, @photography/film@, …) are -- Sub-tags (@photography/landscape@, @photography/film@, …) are
-- unaffected; they keep their tag pages because no section landing -- unaffected; they keep their tag pages because no section landing
-- claims those URLs. -- claims those URLs.
--
-- Other portal tags (@music@, @poetry@, @fiction@, …) don't appear
-- here because their content types don't currently feed
-- 'tagIndexable', so the top-level tag never enters the tag system.
-- Add to this set if that ever changes.
sectionOwnedTopLevelTags :: [String] sectionOwnedTopLevelTags :: [String]
sectionOwnedTopLevelTags = ["photography"] sectionOwnedTopLevelTags =
[ "photography", "poetry", "fiction", "music", "essays", "blog"
, "cv", "archive", "authors", "bibliography"
]
-- | All expanded tags for an item (reads the "tags" metadata field). -- | All expanded tags for an item (reads the "tags" metadata field).
-- Filters out any 'sectionOwnedTopLevelTags' to prevent route -- Filters out any 'sectionOwnedTopLevelTags' to prevent route
@ -293,6 +295,10 @@ sidecarContext sidecarSet tag
-- Provides the fields consumed by @templates/partials/item-card.html@ -- Provides the fields consumed by @templates/partials/item-card.html@
-- (@$item-kind$@, @$date-iso$@, @$date-created$@, @$abstract$@, -- (@$item-kind$@, @$date-iso$@, @$date-created$@, @$abstract$@,
-- @$item-tags$@) with tag-ribbon suppression scoped to the current tag. -- @$item-tags$@) with tag-ribbon suppression scoped to the current tag.
--
-- Composes 'siteCtx' (not bare 'defaultContext') so per-item fields
-- the card partial gates on — notably @$has-monogram$@ — fire here
-- the same way they do on /new.html and the library.
tagItemCtx :: String -> Context String tagItemCtx :: String -> Context String
tagItemCtx scope = tagItemCtx scope =
contentKindField contentKindField
@ -301,7 +307,7 @@ tagItemCtx scope =
<> revisionDateFields <> revisionDateFields
<> tagLinksFieldExcludingScope "item-tags" scope <> tagLinksFieldExcludingScope "item-tags" scope
<> abstractField <> abstractField
<> defaultContext <> siteCtx
-- | Page identifier for a tag index page. -- | Page identifier for a tag index page.
-- Page 1 → <tag>/index.html -- Page 1 → <tag>/index.html
@ -359,9 +365,39 @@ clientPaginatedRule tag pat sidecarSet saCtx baseCtx = do
>>= loadAndApplyTemplate "templates/default.html" ctx >>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls >>= relativizeUrls
-- | Display date of an identifier: the most-recent @revised:@ entry's
-- date when present and parseable, else the creation date. Mirrors
-- the (unexported) @itemDisplayUTC@ behind 'Contexts.recentFirstByDisplay',
-- but needs only 'MonadMetadata' — the paginate grouper runs in
-- 'Rules' over bare 'Identifier's, where no 'Item's exist yet.
identifierDisplayUTC :: (MonadMetadata m, MonadFail m)
=> Identifier -> m UTCTime
identifierDisplayUTC ident = do
meta <- getMetadata ident
case getRevisions meta of
(r:_) | Just utc <- (parseTimeM True defaultTimeLocale "%Y-%m-%d"
(revisionDateISO r) :: Maybe UTCTime)
-> return utc
_ -> getItemUTC defaultTimeLocale ident
-- | Partition identifiers into pages of @n@, most recent first by
-- /display/ date — the same revision-aware key
-- 'recentFirstByDisplay' sorts by within each rendered page — so
-- cross-page ordering is monotone. With creation-date partitioning
-- (plain @sortRecentFirst@), a recently revised old item stayed on a
-- late page but jumped to its top; now it migrates to the early page
-- where its displayed date says it belongs.
sortAndGroupByDisplayAt :: (MonadMetadata m, MonadFail m)
=> Int -> [Identifier] -> m [[Identifier]]
sortAndGroupByDisplayAt n ids = do
keyed <- mapM (\i -> (,) <$> identifierDisplayUTC i <*> pure i) ids
return $ paginateEvery n $ map snd $ sortBy (flip (comparing fst)) keyed
-- | Server-side pagination at 'tagPageSize' per page. Previous/next -- | Server-side pagination at 'tagPageSize' per page. Previous/next
-- navigation renders via @templates/partials/paginate-nav.html@; -- navigation renders via @templates/partials/paginate-nav.html@;
-- the count toggle operates within the current page only. -- the count toggle operates within the current page only. Pages are
-- partitioned and sorted by the same display-date key (see
-- 'sortAndGroupByDisplayAt').
serverPaginatedRule :: String serverPaginatedRule :: String
-> Pattern -> Pattern
-> Set Identifier -> Set Identifier
@ -369,7 +405,7 @@ serverPaginatedRule :: String
-> Context String -- ^ base (siteCtx) -> Context String -- ^ base (siteCtx)
-> Rules () -> Rules ()
serverPaginatedRule tag pat sidecarSet saCtx baseCtx = do serverPaginatedRule tag pat sidecarSet saCtx baseCtx = do
paginate <- buildPaginateWith (sortAndGroupAt tagPageSize) pat (tagPageId tag) paginate <- buildPaginateWith (sortAndGroupByDisplayAt tagPageSize) pat (tagPageId tag)
paginateRules paginate $ \pageNum pat' -> do paginateRules paginate $ \pageNum pat' -> do
route idRoute route idRoute
compile $ do compile $ do

View File

@ -27,9 +27,9 @@ wordCount :: String -> Int
wordCount = length . words wordCount = length . words
-- | Estimate reading time in minutes (assumes 200 words per minute). -- | Estimate reading time in minutes (assumes 200 words per minute).
-- Minimum is 1 minute. -- Rounds up — 399 words is 2 minutes, not 1. Minimum is 1 minute.
readingTime :: String -> Int readingTime :: String -> Int
readingTime s = max 1 (wordCount s `div` 200) readingTime s = max 1 ((wordCount s + 199) `div` 200)
-- | Escape HTML special characters: @&@, @<@, @>@, @\"@, @\'@. -- | Escape HTML special characters: @&@, @<@, @>@, @\"@, @\'@.
-- --
@ -62,7 +62,11 @@ trim :: String -> String
trim = dropWhileEnd isSpace . dropWhile isSpace trim = dropWhileEnd isSpace . dropWhile isSpace
-- | Lowercase a string, drop everything that isn't alphanumeric or -- | Lowercase a string, drop everything that isn't alphanumeric or
-- space, then replace runs of spaces with single hyphens. -- space, then replace each space with a hyphen. Note that a run of
-- spaces therefore becomes a run of hyphens (@"A B" → "a--b"@) —
-- deliberately left as-is, since every slug on the site is generated
-- by this one function and collapsing runs now would move existing
-- author URLs.
-- --
-- Used for author URL slugs (e.g. @"Levi Neuwirth" → "levi-neuwirth"@). -- Used for author URL slugs (e.g. @"Levi Neuwirth" → "levi-neuwirth"@).
-- Centralised here so 'Authors' and 'Contexts' cannot drift on Unicode -- Centralised here so 'Authors' and 'Contexts' cannot drift on Unicode

View File

@ -68,8 +68,8 @@ constraints: any.Glob ==0.10.2,
any.deepseq ==1.4.8.1, any.deepseq ==1.4.8.1,
any.digest ==0.0.2.1, any.digest ==0.0.2.1,
any.directory ==1.3.8.5, any.directory ==1.3.8.5,
any.distributive ==0.6.2.1, any.distributive ==0.6.3,
any.djot ==0.1.2.3, any.djot ==0.1.2.4,
any.dlist ==1.0, any.dlist ==1.0,
any.doclayout ==0.5.0.1, any.doclayout ==0.5.0.1,
any.doctemplates ==0.11.0.1, any.doctemplates ==0.11.0.1,
@ -198,7 +198,7 @@ constraints: any.Glob ==0.10.2,
any.unliftio-core ==0.2.1.0, any.unliftio-core ==0.2.1.0,
any.unordered-containers ==0.2.20.1, any.unordered-containers ==0.2.20.1,
any.utf8-string ==1.0.2, any.utf8-string ==1.0.2,
any.uuid-types ==1.0.6, any.uuid-types ==1.0.6.1,
any.vault ==0.3.1.6, any.vault ==0.3.1.6,
any.vector ==0.13.2.0, any.vector ==0.13.2.0,
any.vector-algorithms ==0.9.1.0, any.vector-algorithms ==0.9.1.0,

View File

@ -1,7 +1,6 @@
--- ---
title: Colophon title: Colophon
date: 2026-03-21 date: 2026-03-21
modified: 2026-04-27
status: "Durable" status: "Durable"
confidence: 93 confidence: 93
tags: [meta] tags: [meta]

View File

@ -1,23 +0,0 @@
---
title: "The Specification Dilemma"
date: 2026-04-20 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
We should not consider AI entities as mere tools, though they may be the raw foundation from which exceptional tools for thought are constructed to augment the human mind. Rather, we should consider AI as the ultimate distillation and consolidation of humanity's achievements - the ultimate progeny of our civilization.
tags: # optional; see Tags section
- ai
- tech
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 100 # 0100 integer (%)
importance: 5 # 15 integer (rendered as filled/empty dots ●●●○○)
evidence: 1 # 15 integer (same)
scope: civilizational # personal | local | average | broad | civilizational
novelty: idiosyncratic # conventional | moderate | idiosyncratic | innovative
practicality: moderate # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
---
TODO: block quote about Richard Feynman and the beauty of science - idea "it's more beautiful this way"
I have often felt there has been a loss of wonder from the world, and I lament this fact.

View File

@ -1,41 +0,0 @@
---
title: "The Modern Idolatry"
date: 2026-04-06
abstract: >
Thoughts on idolizing notions of success, whether extrinsic or intrinsic, prompted by my upcoming graduation from Brown University and a recent week spent in Paris.
tags:
- miscellany
- philosophy
- personal
- personal/travel
authors:
- "Levi Neuwirth | /me.html"
status: "Draft"
history:
- date: "2026-04-06"
---
Travel affects me profoundly, and the effect is strangely uniform. There is a hierarchical structure of dichotomies that seems to define most aspects of my life, and my interactions with place are no exception to this rule. One of the dichotomies is as follows: I am rather accustomed to moving around in my adult life to date, never spending more than 4 months in a place before spending at least a few weeks somewhere else, and yet I rapidly develop a sense of "home" wherever I am - a stagnation of sorts, an acceptance of the region in which I reside and an abstraction away of the remainder of the world to some vast, estoeric TERRA INCOGNITA. Perhaps the most profound, persistent personal effect of travel on me is that it knocks me out of this mental state of spatial hibernation, reminding me that there is an entire world beyond that which I consistently perceive, and that I have the means to do something to have a positive impact on it. This has been a profoundly important sensation for me to have for many years now, and is thus one basis by which travel is consistently a high priority for me.
This is often combined with a sense of grand melancholy, the sort that for me is nearly ubiquitous in the presence of grandeur and beauty. It is a different incarnation of the same melancholy^[I should emphasize here that while "melancholy" may in general invoke a negative connotation, I do not feel that this is a negative emotion whatsoever. To me, the primary effect of melancholy, or at least melancholy of this sort, is an amplification of the imposing impetus, usually some sense of grandeur. The melancholy is like delicate cinnamon powder added to the top of a pristine flat white.] that I feel when I listen to a profound piece of music, view a painting that I enjoy, or reach the summit of a mountain that I have been embracing for hours. In this case the strength is perhaps yielded by the confluence of grandeur of the natural world - the vastness of space, the mystery of distinct regions that I have yet to know and the warm embrace of returning to those which I know but not well - and that of the human world - the various cultures, languages, beliefs, institutions, and above all people that are present in various places.
This grand, amplified melancholy typically has three causes in my life, two of which I have already mentioned. The third is instances of outward-facing "success" - I typically feel melancholic and pensive when I have done something or crossed some milestone^["Milestones" are not terms that I would use nor guidelines or aspects of some personal timeline or plan, but rather things that society imposes. They don't mean much to me on a personal level, but do unavoidably impact how I feel, since I cannot avoid societial influences as much as I sometimes wish I could.] that many folks see as an indicator of success (or the potential for it). One might imagine, then, that I felt quite a sensation as I was travelling in Paris during my most recent spring break, on the verge of graduating from Brown University after four years of work and extreme personal growth, and such an imagination would be highly warranted. As I took endless walks on the [Champ de Mars](https://en.wikipedia.org/wiki/Champ_de_Mars) and along the [Seine](https://en.wikipedia.org/wiki/Seine) many thoughts and musings were prompted by the grand sensations of emotion, grandeur, and wonder that I felt. They are largely concentrated around the theme of modern idolatry in the name of "success" and the impliciations of this, on both a personal and broader philosophical and societal level. My attempts to collect them into a format that I can share follow.
## Dichotomies
<figure class="prose-excerpt">
<blockquote>
"Everything is a dichotomy; that is perhaps the grandeur of life, of the Universe itself."
</blockquote>
<figcaption>Levi's personal journal, 29 January 2026</figcaption>
</figure>
::: dropcap
What of "success" do I understand, and what of it have I cumulatively failed to understand? Of course, this question depends on one's chosen definition of "success," so perhaps the most interesting approach is to parameterize our choice of definition. Indeed, SUCCESS is a concept that means different things to different people, so perhaps such parameterization is implicitly necessary. Yet such parameterization unsettles me greatly on a personal level. It is the first example of dichotomy that we, together, may explore.
:::
Society widely seems to view success as the fulfillment of goals rooted in extrinsic motivations. The credentialist nature of our society seems to conflate one's ability to earn a title with competence, experience, and, in some cases, worthiness - and who, exactly, is worthy of success, or, rather, is it success that deems one worthy in the eyes of the world? In more ways than one, it seems that we have been conditioned somehow through our institutions, both explicit and implicit, to conflate worthiness with success, and this conflation is perhaps grounded in the idea that success will be transitative; that is, one's continued association with successful people leads to more successful outcomes. This seems to imply that "success" is somehow a communal thing, inherently extrinsic that it diffuses and saturates, so long as those who have it^[For the sake of illustration here we are assuming that "success" is something to be had, a notion that will be debunked later.] are willing to continue associating with those who have less of it.
Yet this is in direct contrast to what is arguably the foundation of our^[I use "our" here to refer to citizens of the United States, my country of birth and the culture that largely influenced my perception of success.] success. The extrinsic nature of such success is not problematic, but the communal aspect is. The ethos of the [American Dream](https://en.wikipedia.org/wiki/American_Dream) is largely that of individualism - the promise that dense individual effort leads to success.

View File

@ -1,236 +0,0 @@
---
title: A Test Essay
date: 2026-03-14
abstract: A comprehensive end-to-end exercise of the Hakyll pipeline — typography, code, math, sidenotes, filters, tables, exhibits, and annotations.
tags: [meta]
affiliation: "Department of Imaginary Systems, University of Nowhere | https://example.com"
status: Working model
confidence: 72
importance: 3
evidence: 2
scope: average
novelty: moderate
practicality: moderate
confidence-history: [55, 63, 72]
history:
- date: "2026-03-01"
note: Initial draft
- date: "2026-03-14"
note: Expanded typography and citation sections; added math examples
---
The body typeface is Spectral, a screen-first serif with seven weights and full OpenType support. Old-style figures are enabled by default: the year 2026, the number 1984, Euler's number 2.718. Standard ligatures are active: *first*, *fifty*, *ffle*. The typographic principles informing this layout draw on Butterick[@butterick2019] and Tufte[@tufte1983]. This document is built with Pandoc[@pandoc].
Paragraphs following one another use first-line indentation in the traditional book manner, with no inter-paragraph vertical gap. This is the second paragraph of the opening section, and you should see the indent at the start of this line.
A third paragraph to confirm the indent is consistent across multiple consecutive paragraphs and does not drift or accumulate.
## Typography
### Headings
Headings are set in Fira Sans Semibold, a humanist sans-serif that complements Spectral. The hierarchy below demonstrates all levels used in practice.
## Section heading (H2)
### Subsection heading (H3)
#### Minor heading (H4)
##### Rarely used (H5)
Body text resumes here, following the heading sequence above. The vertical rhythm above each heading and the transition back to Spectral below it should feel natural, not abrupt.
### Inline Elements
This sentence demonstrates **bold emphasis (700)** and <strong class="semibold">semibold emphasis (600)</strong> side by side — the authorial choice the spec describes. Italic text looks like *this phrase set in Spectral italic*. Combined: ***bold italic***.
Abbreviations use Spectral's true small-caps via the `smcp` OpenType feature: the organisations <abbr title="National Science Foundation">NSF</abbr>, <abbr title="American Civil Liberties Union">ACLU</abbr>, and <abbr title="Central Intelligence Agency">CIA</abbr>. These should appear as genuine small capitals, not scaled-down full caps.
Superscripts use Spectral's `sups` glyphs: E = mc^2^, footnote reference^1^, ordinals like 1^st^ and 2^nd^. Subscripts use `subs`: H~2~O, CO~2~.
Inline code looks like `cabal run site -- build` and sits comfortably in a line of Spectral body text. The size differential and background tint should clearly distinguish it without being jarring.
### Blockquotes
> The site is the proof. If a site about careful writing is itself carelessly made, the argument is self-defeating. Every element must earn its presence.
Text resumes after the blockquote without indent — the indent reset rule is working if this line begins flush left.
> A nested quotation scenario: this outer blockquote contains ordinary text, establishing the left-border visual hierarchy.
## Code
JetBrains Mono is used for all code. Ligatures and contextual alternates are active: `->` `=>` `!=` `::` `>=` in inline code, and in blocks below.
```haskell
-- Hakyll site compiler entry point
module Main where
import Hakyll (hakyll)
import Site (rules)
main :: IO ()
main = hakyll rules
```
```css
/* CSS custom property example */
:root {
--bg: #faf8f4;
--text: #1a1a1a;
}
body {
background-color: var(--bg);
color: var(--text);
font-feature-settings: 'liga' 1, 'onum' 1;
}
```
```python
def greet(name: str) -> str:
return f"Hello, {name}!"
```
The code block border, background tint, and monospaced font should feel quiet — part of the page, not a jarring box.
## Tables
Tables use Fira Sans at 90% size, with lining figures and tabular spacing enabled for numeric alignment.
| Font | Role | Weight(s) | File size |
|:---------------|:----------------|:------------|:----------|
| Spectral | Body text | 400, 600, 700 | 2124 KB |
| Fira Sans | UI / headings | 400, 600 | 16 KB |
| JetBrains Mono | Code | 400 | 1920 KB |
## Dark Mode
Use the toggle in the top-right corner of the nav to switch between light and dark. Both themes use warm monochrome palettes derived from the same base hue. The background, text, borders, muted text, code blocks, and blockquote borders should all shift coherently.
Check the following specifically in dark mode: sidenotes, code block backgrounds, the blockquote border, and the table header row. The `transition` on `body` should make the switch feel smooth rather than abrupt.
- Background: `#1c1a18` (warm dark, not pure black)
- Text: `#e8e5df` (warm off-white, not pure white)
- Muted text, borders: proportionally darker warm greys
## Mathematics
The quadratic formula solves $ax^2 + bx + c = 0$ for real roots:
$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$
This is a well-known result.[^quadratic] Euler's identity is often cited as the most beautiful equation in mathematics:
$$e^{i\pi} + 1 = 0$$
It connects the five most important constants in mathematics.[^euler] The CSS smallcaps filter should catch abbreviations like NASA, HTML, CSS, and API automatically.
[^quadratic]: The formula follows directly from completing the square. For a derivation, see any introductory algebra text, e.g. Stewart's *Precalculus*.
[^euler]: This follows from Euler's formula $e^{i\theta} = \cos\theta + i\sin\theta$ evaluated at $\theta = \pi$.
### Turán's Theorem
The Turán graph $T(n,k)$ is the complete $k$-partite graph on $n$ vertices with part sizes as equal as possible. Its edge count is given by the formula below — this is the identity the moving-vertex argument exploits.
::: {.exhibit .exhibit--equation data-exhibit-name="Turán Edge Count" data-exhibit-type="equation" data-exhibit-caption="Edge count of a complete k-partite graph: total pairs minus same-part pairs."}
:::: exhibit-body
$$\binom{n}{2} - \sum_{i=1}^{k}\binom{m_i}{2}$$
::::
:::
Every pair of vertices is adjacent *except* those within the same part, so the formula counts edges by subtracting same-part pairs from all pairs.
::: {.annotation .annotation--static}
<div class="annotation-header">
<span class="annotation-label">Remark</span>
<span class="annotation-name">Equal parts maximise edges</span>
</div>
<div class="annotation-body">
Intuitively: if two parts differ in size by more than one vertex, moving a vertex from the larger to the smaller part creates more cross-part pairs than it destroys within-part pairs. The moving-vertex argument below makes this precise.
</div>
:::
::: {.annotation .annotation--collapsible}
<div class="annotation-header">
<span class="annotation-label">Note</span>
<span class="annotation-name">Turán graph definition</span>
<button class="annotation-toggle" aria-expanded="false">▸ expand</button>
</div>
<div class="annotation-body">
The *Turán graph* $T(n,k)$ is the unique (up to isomorphism) complete $k$-partite graph on $n$ vertices whose part sizes differ by at most one. By Turán's theorem, $T(n,k)$ is the $K_{k+1}$-free graph on $n$ vertices with the maximum number of edges.
</div>
:::
::: {.exhibit .exhibit--proof data-exhibit-name="Turán Bound" data-exhibit-type="proof" data-exhibit-caption="Moving one vertex from the larger to the smaller part strictly increases the edge count when parts differ by ≥ 2."}
:::: exhibit-body
Without loss of generality suppose $n_1 - n_2 \ge 2$. Form a new complete $k$-partite graph by moving one vertex from part 1 to part 2. Since the new graph is still complete $k$-partite on the same $n$ vertices, it suffices to show it has strictly more edges.
The number of edges in any complete $k$-partite graph $M_{m_1,\ldots,m_k}$ is
$$\binom{n}{2} - \sum_{i=1}^{k}\binom{m_i}{2},$$
since every pair of vertices is adjacent *except* those within the same part. Therefore
$$|E(G')| - |E(G)| = \binom{n_1}{2} + \binom{n_2}{2} - \binom{n_1-1}{2} - \binom{n_2+1}{2}.$$
Using $\binom{m}{2} = \frac{m(m-1)}{2}$, this simplifies to $(n_1 - 1) - n_2 = n_1 - n_2 - 1$. Since $n_1 - n_2 \ge 2$, we get $|E(G')| - |E(G)| \ge 1 > 0$. [□]{.proof-qed}
::::
:::
## Music Notation
Score fragments are embedded inline as responsive SVGs, integrated with the gallery focusable system. Clicking the fragment — or the expand glyph that appears on hover — opens the shared overlay. The SVG inherits the page's text color via `currentColor`, so notation renders correctly in both light and dark modes. The caption below the score is a persistent `<figcaption>`, in keeping with the convention of printed musical editions.
Prose commentary surrounds the fragment just as it would in an analytical text — above to introduce the passage, below to elaborate on what was shown.
## Links and Wikilinks
External links with domain classes: [Wikipedia on the quadratic formula](https://en.wikipedia.org/wiki/Quadratic_formula), an [arXiv preprint](https://arxiv.org/abs/1234.5678), a [DOI link](https://doi.org/10.1000/xyz123), and [jgm/pandoc on GitHub](https://github.com/jgm/pandoc). A generic external: [example.com](https://example.com).
An internal link [to the essay index](/essays/index.html) is left completely unchanged — no extra classes or attributes added.
Wikilinks: [[About This Site]] resolved from `[[About This Site]]`, and [[The Colophon|the colophon]] resolved from `[[The Colophon|the colophon]]`.
## Filter Output
### Abbreviations
`Filters.Typography` matches exact Pandoc `Str` tokens against a table of common Latin abbreviations and wraps them in `<abbr title="…">` elements. Hover over the highlighted abbreviations below to see the tooltip.
Common scholarly shorthand: e.g. the quadratic formula, i.e. the formula $x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}$. See cf. Stewart §3.4. The argument follows from first principles, viz. the moving-vertex technique. NB: the result holds only for $k \ge 2$.
### Smallcaps
`Filters.Smallcaps` detects runs of three or more uppercase letters and wraps them in `<abbr class="smallcaps">`. Technology acronyms detected automatically: HTML, CSS, API, JSON, URL, NASA, MIT. Trailing punctuation is stripped before the check so HTTP, and REST. also work correctly.
Not converted: short tokens like I, OK (two letters), or mixed-case tokens like JavaScript, macOS, or LaTeX.
### Annotations
::: {.annotation .annotation--static}
<div class="annotation-header">
<span class="annotation-label">Remark</span>
<span class="annotation-name">On static annotations</span>
</div>
<div class="annotation-body">
This is a static annotation. It is always visible and has no toggle. The border separates the header from the body.
</div>
:::
::: {.annotation .annotation--collapsible}
<div class="annotation-header">
<span class="annotation-label">Note</span>
<span class="annotation-name">On collapsible annotations</span>
<button class="annotation-toggle" aria-expanded="false">▸ expand</button>
</div>
<div class="annotation-body">
This annotation is collapsed by default. The abbreviations i.e. and e.g. should be wrapped in `<abbr>` tags by `Filters.Typography`. Clicking the button should expand and collapse this body smoothly, with the last line fully visible.
</div>
:::

View File

@ -1,47 +0,0 @@
---
title: "Universities Should Care"
date: 2026-04-28 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
As Students should be more than a mere statistic to the Universities at which they study. I critique Brown University, my undergraduate institution, in this regard. The degradation of students to treatment as if they are a mere statistic is potentially a major reason for the decline in postsecondary education in the modern United States.
tags: # optional; see Tags section
- ai
- tech
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 85 # 0100 integer (%)
importance: 4 # 15 integer (rendered as filled/empty dots ●●●○○)
evidence: 5 # 15 integer (same)
scope: broad # personal | local | average | broad | civilizational
novelty: moderate # conventional | moderate | idiosyncratic | innovative
practicality: high # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
---
---
Planning: List of grievances
COMPUTER SCIENCE
- TA System section.
-
RES LIFE
- Obviously: repeated requests for discussion and process for moving out in Fall '23.
- Unable to control heat
- Lack of bathrooms.
- Lack of kitchens
DINING
- Let's run through some calculations to see the actual cost of every meal averaged across a semester.
- No real late night options.
- Poor optimization of queues / high demand items like grilled cheese.
- Inconsistent pricing for the same items across locations.
SECURITY
- No substantive changes since December 13th.
EFFECTS ON THE CULTURE

View File

@ -13,7 +13,6 @@ importance: 1
scope: personal scope: personal
novelty: conventional novelty: conventional
practicality: moderate practicality: moderate
confidence-history:
--- ---
A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/). A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/).

View File

@ -18,7 +18,6 @@ evidence: 4
scope: broad scope: broad
novelty: innovative novelty: innovative
practicality: high practicality: high
confidence-history:
--- ---
A fuller write-up follows with the clinical-implications manuscript. In the meantime, see the [projects index](/cv/projects/). A fuller write-up follows with the clinical-implications manuscript. In the meantime, see the [projects index](/cv/projects/).

View File

@ -1,23 +1,20 @@
--- ---
title: "Speculative Reluctance" title: "Speculative Reluctance"
date: 2026-04-15 # required; used for ordering, feed, and display date: 2026-04-15
abstract: > # optional; shown in the metadata block and link previews abstract: >
AI labs are likely deliberately reluctant to scale because they are aware that any imminient shift to locally run models as the norm would render their compute redundant. We take Anthropic as a principal case study to validate this hypothesis. AI labs are likely deliberately reluctant to scale because they are aware that any imminient shift to locally run models as the norm would render their compute redundant. We take Anthropic as a principal case study to validate this hypothesis.
tags: # optional; see Tags section tags:
- ai - ai
- tech - tech
- speculative - speculative
- open - open
status: "Draft"
# Epistemic profile — all optional; the entire section is hidden unless `status` is set confidence: 55
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated importance: 3
confidence: 55 # 0100 integer (%) evidence: 1
importance: 3 # 15 integer (rendered as filled/empty dots ●●●○○) scope: broad
evidence: 1 # 15 integer (same) novelty: moderate
scope: broad # personal | local | average | broad | civilizational practicality: high
novelty: moderate # conventional | moderate | idiosyncratic | innovative
practicality: high # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
--- ---
Running a lab that develops frontier LLMs is somewhat like playing a game that, by all measurable metrics external, you are bound to lose. The amount of compute required to train a frontier LLM is unbelievably expensive. The expense of inference is even more astronomical. OpenAI claims at the time of this writing to have somewhere between 900 Million and 1 Billion active users, all of whom require some amount of inference cost, and some small subset of whom consume an enormous amount of compute - to use their words, this is ["commercial scale."](https://openai.com/index/accelerating-the-next-phase-ai/). This isn't to mention the immense amount of competition - there are many major players in the United States alone contributing models that push the boundaries. OpenAI may have been the first, but Anthropic, Google, Meta, xAI, and, yes, even Amazon and Bytedance are following right along. Running a lab that develops frontier LLMs is somewhat like playing a game that, by all measurable metrics external, you are bound to lose. The amount of compute required to train a frontier LLM is unbelievably expensive. The expense of inference is even more astronomical. OpenAI claims at the time of this writing to have somewhere between 900 Million and 1 Billion active users, all of whom require some amount of inference cost, and some small subset of whom consume an enormous amount of compute - to use their words, this is ["commercial scale."](https://openai.com/index/accelerating-the-next-phase-ai/). This isn't to mention the immense amount of competition - there are many major players in the United States alone contributing models that push the boundaries. OpenAI may have been the first, but Anthropic, Google, Meta, xAI, and, yes, even Amazon and Bytedance are following right along.

View File

@ -19,7 +19,6 @@ evidence: 5
scope: civilizational scope: civilizational
novelty: innovative novelty: innovative
practicality: moderate practicality: moderate
confidence-history:
--- ---
There are at least two distinct ways to reduce the search space over which AGI^[The definition of "Artificial General Intelligence", or whether such a definition exists, is contentious. My use of the term is not intended to endorse any proposed timeline for AGI, nor to suggest that it is inevitable. It is rather to provide calibration through a hypothetical goal that clearly justifies pursuit.] will have to operate. The first involves a harmonious interaction of agent and human, not transactional in origin, not fully autonomous nor fully human-driven, but rather collaborative in nature - the agent augments the capacity of the human, just as any other good tool for thought does, by working within the scope of something well specified and ideated upon. This is not to say that the agent cannot have a place in such planning, but rather that the human is ultimately the driver of the actions and tasks, defining the scope of what is to be done in as much detail as possible without being the one to actually do it. There are at least two distinct ways to reduce the search space over which AGI^[The definition of "Artificial General Intelligence", or whether such a definition exists, is contentious. My use of the term is not intended to endorse any proposed timeline for AGI, nor to suggest that it is inevitable. It is rather to provide calibration through a hypothetical goal that clearly justifies pursuit.] will have to operate. The first involves a harmonious interaction of agent and human, not transactional in origin, not fully autonomous nor fully human-driven, but rather collaborative in nature - the agent augments the capacity of the human, just as any other good tool for thought does, by working within the scope of something well specified and ideated upon. This is not to say that the agent cannot have a place in such planning, but rather that the human is ultimately the driver of the actions and tasks, defining the scope of what is to be done in as much detail as possible without being the one to actually do it.

View File

@ -14,7 +14,6 @@ importance: 1
scope: local scope: local
novelty: moderate novelty: moderate
practicality: low practicality: low
confidence-history:
--- ---
A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/). A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/).

View File

@ -19,7 +19,7 @@ authors:
affiliation: affiliation:
- "Department of Computer Science, Brown University | https://cs.brown.edu" - "Department of Computer Science, Brown University | https://cs.brown.edu"
bibliography: data/simd-paper.bib bibliography: data/simd-paper.bib
repository: "https://git.levineuwirth.org/where-simd-helps" repository: "https://git.levineuwirth.org/neuwirth/where-simd-helps"
--- ---
## Introduction ## Introduction

View File

@ -42,8 +42,20 @@ add_header Permissions-Policy
# report stream has been clean for a week. # report stream has been clean for a week.
# #
# External origins justified inline: # External origins justified inline:
# cdn.jsdelivr.net KaTeX CSS + JS, Vega / Vega-Lite / Vega-Embed # cdn.jsdelivr.net KaTeX CSS + JS + webfonts (the KaTeX CSS
# references its fonts relatively, so they
# resolve to the CDN -> font-src), Vega /
# Vega-Lite / Vega-Embed, transformers.js
# (whose onnxruntime fetches its .wasm from
# the CDN via fetch() -> connect-src)
# *.basemaps.cartocdn.com Leaflet basemap tiles (photography map only) # *.basemaps.cartocdn.com Leaflet basemap tiles (photography map only)
# connect-src API hosts link-popup providers fetched directly via
# CORS (the list popups.js documents in its
# header, plus git.levineuwirth.org for the
# Forgejo provider). The CORS-broken trio
# (arxiv, archive.org, pubmed) goes through
# the same-origin /proxy/ instead — see
# nginx/popup-proxy.conf.
# #
# Why 'unsafe-inline' on style: # Why 'unsafe-inline' on style:
# - photography.html emits <span style="background:$swatch$"> for # - photography.html emits <span style="background:$swatch$"> for
@ -53,18 +65,14 @@ add_header Permissions-Policy
# Why 'unsafe-eval' on script: # Why 'unsafe-eval' on script:
# - vega-embed compiles Vega-Lite specs at runtime via new Function(). # - vega-embed compiles Vega-Lite specs at runtime via new Function().
# Removing this would require pre-compiling specs at build time. # Removing this would require pre-compiling specs at build time.
# - it also covers WebAssembly.instantiate for onnxruntime-web
# (semantic search).
#
# The value MUST stay on one physical line: nginx has no line
# continuation inside quoted strings — a trailing backslash would embed
# literal backslash + LF bytes in the header value, which is illegal in
# HTTP/2 and gets whole responses rejected by strict clients.
# #
# To collect violation reports, set up a `report-uri` endpoint and add # To collect violation reports, set up a `report-uri` endpoint and add
# `report-uri /csp-report;` (and/or `report-to <group>;`) below. # `report-uri /csp-report;` (and/or `report-to <group>;`) below.
add_header Content-Security-Policy-Report-Only add_header Content-Security-Policy-Report-Only "default-src 'self'; script-src 'self' 'unsafe-eval' https://cdn.jsdelivr.net; style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; img-src 'self' data: https://*.basemaps.cartocdn.com; font-src 'self' data: https://cdn.jsdelivr.net; connect-src 'self' https://cdn.jsdelivr.net https://*.wikipedia.org https://api.crossref.org https://api.github.com https://openlibrary.org https://api.biorxiv.org https://www.youtube.com https://git.levineuwirth.org; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; object-src 'none'; upgrade-insecure-requests" always;
"default-src 'self'; \
script-src 'self' 'unsafe-eval' https://cdn.jsdelivr.net; \
style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; \
img-src 'self' data: https://*.basemaps.cartocdn.com; \
font-src 'self' data:; \
connect-src 'self'; \
frame-ancestors 'none'; \
base-uri 'self'; \
form-action 'self'; \
object-src 'none'; \
upgrade-insecure-requests" always;

View File

@ -7,7 +7,6 @@ dependencies = [
# Visualization # Visualization
"matplotlib>=3.9,<4", "matplotlib>=3.9,<4",
"altair>=5.4,<6", "altair>=5.4,<6",
# Embedding pipeline # Embedding pipeline
# Upper bounds are intentionally generous (next major) but always # Upper bounds are intentionally generous (next major) but always
# present so that an unrelated `uv sync` upgrade can't silently pull # present so that an unrelated `uv sync` upgrade can't silently pull
@ -18,7 +17,6 @@ dependencies = [
"beautifulsoup4>=4.12,<5", "beautifulsoup4>=4.12,<5",
# CPU-only torch — avoids pulling ~3 GB of CUDA libraries # CPU-only torch — avoids pulling ~3 GB of CUDA libraries
"torch>=2.5,<3", "torch>=2.5,<3",
# Photography pipeline # Photography pipeline
# Pillow handles EXIF reading when exiftool is not installed (the # Pillow handles EXIF reading when exiftool is not installed (the
# preferred path); colorthief computes the 5-color palette strip. # preferred path); colorthief computes the 5-color palette strip.
@ -26,6 +24,10 @@ dependencies = [
"pillow>=10.0,<12", "pillow>=10.0,<12",
"colorthief>=0.2,<1", "colorthief>=0.2,<1",
"pyyaml>=6.0,<7", "pyyaml>=6.0,<7",
# Not imported by this repo: required at runtime by nomic-embed's
# remote modeling code (nomic-bert-2048, loaded by embed.py's page
# pass under trust_remote_code with a pinned code_revision).
"einops>=0.8.2,<1",
] ]
[[tool.uv.index]] [[tool.uv.index]]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -70,34 +70,35 @@ nav.site-nav {
} }
/* Home logo square button flush into the top-left corner of the nav bar. /* Home logo square button flush into the top-left corner of the nav bar.
The L silhouette is rendered via ::before mask-image so the background The rooted-L mark lives in /logo-sprite.svg and is referenced with
matches --bg-nav exactly and the foreground follows --nav-logo-fg (set <use> (cacheable once, not ~33 KB inlined per page). Its two-tone
per theme in base.css override there to restyle for light mode). */ cutout still renders because CSS custom properties cascade into the
use-element shadow tree: the letter is drawn in --logo-ink and the
root filament is punched through in --logo-bg. Mapping --logo-bg to
--bg-nav (the button's own surface) makes the roots read as the nav
background showing through. Both tokens are theme-driven in
base.css override --nav-logo-fg / --bg-nav there to restyle per
theme. */
.nav-logo { .nav-logo {
position: absolute; position: absolute;
left: 0; left: 0;
top: 0; top: 0;
bottom: 0; bottom: 0;
aspect-ratio: 1 / 1; aspect-ratio: 1 / 1;
display: block; display: flex;
align-items: center;
justify-content: center;
overflow: hidden; overflow: hidden;
flex-shrink: 0; flex-shrink: 0;
text-decoration: none; text-decoration: none;
background-color: var(--bg-nav); background-color: var(--bg-nav);
--logo-ink: var(--nav-logo-fg);
--logo-bg: var(--bg-nav);
} }
.nav-logo::before { .nav-logo__mark {
content: ''; width: 76%;
position: absolute; height: 76%;
inset: 12%; display: block;
background-color: var(--nav-logo-fg);
mask-image: url('/images/link-icons/internal.svg');
mask-size: contain;
mask-repeat: no-repeat;
mask-position: center;
-webkit-mask-image: url('/images/link-icons/internal.svg');
-webkit-mask-size: contain;
-webkit-mask-repeat: no-repeat;
-webkit-mask-position: center;
} }
/* Controls cluster: portals toggle + theme toggle, pinned right */ /* Controls cluster: portals toggle + theme toggle, pinned right */

View File

@ -16,8 +16,10 @@
For an inline <span> inside a <p>, this is roughly the line containing For an inline <span> inside a <p>, this is roughly the line containing
the sidenote reference, giving correct vertical alignment without JS. the sidenote reference, giving correct vertical alignment without JS.
On narrow viewports the <span> is hidden and the Pandoc-generated On narrow viewports the <span> is hidden and the
<section class="footnotes"> at document end is shown instead. <section class="footnotes"> the Sidenotes filter appends at document
end is shown instead (Pandoc's own footnote section never exists
the filter consumes every Note, and re-emits this fallback itself).
*/ */
/* ============================================================ /* ============================================================
@ -137,22 +139,54 @@
/* ============================================================ /* ============================================================
FOOTNOTE REFERENCES shown on narrow viewports alongside FOOTNOTES FALLBACK LIST the section the Sidenotes filter
section.footnotes appends at document end; visible on narrow viewports only
(see the media queries above). Letter labels are rendered
explicitly because an <ol>'s automatic numbers would disagree
with the in-text letter refs.
============================================================ */ ============================================================ */
a.footnote-ref { section.footnotes .footnotes-list {
text-decoration: none; list-style: none;
color: var(--text-faint); margin: 0;
font-size: 0.75em; padding: 0;
line-height: 0; }
.footnote-item {
position: relative; position: relative;
top: -0.4em; padding-left: 1.5rem;
margin-bottom: 0.85rem;
font-size: 0.85rem;
line-height: 1.6;
color: var(--text-muted);
}
.footnote-label {
position: absolute;
left: 0;
top: 0.15em;
font-family: var(--font-sans); font-family: var(--font-sans);
font-size: 0.75em;
color: var(--text-faint);
}
/* First paragraph flows on the label's line; later ones stack. */
.footnote-item > p {
margin: 0 0 0.5em;
}
.footnote-item > p:first-of-type {
display: inline;
}
.footnote-back {
margin-left: 0.35em;
text-decoration: none;
font-family: var(--font-sans);
color: var(--text-faint);
transition: color var(--transition-fast); transition: color var(--transition-fast);
} }
a.footnote-ref:hover { .footnote-back:hover {
color: var(--text-muted); color: var(--text-muted);
} }

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.0 MiB

After

Width:  |  Height:  |  Size: 1.8 MiB

View File

@ -12,6 +12,8 @@
var STORAGE_KEY = 'site-annotations'; var STORAGE_KEY = 'site-annotations';
var tooltip = null; var tooltip = null;
var tooltipTimer = null; var tooltipTimer = null;
var tooltipPinned = false; /* keyboard-opened: blur must not dismiss */
var tooltipMark = null; /* mark that opened the tooltip, for focus return */
/* ------------------------------------------------------------------ /* ------------------------------------------------------------------
Storage Storage
@ -148,6 +150,18 @@
tooltip.addEventListener('mouseenter', function () { clearTimeout(tooltipTimer); }); tooltip.addEventListener('mouseenter', function () { clearTimeout(tooltipTimer); });
tooltip.addEventListener('mouseleave', function () { hideTooltip(false); }); tooltip.addEventListener('mouseleave', function () { hideTooltip(false); });
/* Keyboard flow: Escape closes a pinned tooltip and returns focus
to its mark; tabbing out of the tooltip dismisses it. */
tooltip.addEventListener('keydown', function (e) {
if (e.key === 'Escape') {
hideTooltip(true);
if (tooltipMark) tooltipMark.focus();
}
});
tooltip.addEventListener('focusout', function (e) {
if (!tooltip.contains(e.relatedTarget)) hideTooltip(false);
});
} }
/* Defer to the shared utility (loaded synchronously from /* Defer to the shared utility (loaded synchronously from
@ -159,6 +173,8 @@
function showTooltip(mark, ann) { function showTooltip(mark, ann) {
clearTimeout(tooltipTimer); clearTimeout(tooltipTimer);
tooltipPinned = false;
tooltipMark = mark;
var note = ann.note || ''; var note = ann.note || '';
var created = ann.created ? new Date(ann.created).toLocaleDateString() : ''; var created = ann.created ? new Date(ann.created).toLocaleDateString() : '';
@ -197,6 +213,7 @@
function hideTooltip(immediate) { function hideTooltip(immediate) {
clearTimeout(tooltipTimer); clearTimeout(tooltipTimer);
tooltipPinned = false;
if (immediate) { if (immediate) {
if (tooltip) tooltip.classList.remove('is-visible'); if (tooltip) tooltip.classList.remove('is-visible');
} else { } else {
@ -212,6 +229,28 @@
showTooltip(mark, ann); showTooltip(mark, ann);
}); });
mark.addEventListener('mouseleave', function () { hideTooltip(false); }); mark.addEventListener('mouseleave', function () { hideTooltip(false); });
/* Keyboard: focus mirrors hover; Enter/Space pins the tooltip and
moves focus to its Delete button; Escape dismisses. */
mark.setAttribute('tabindex', '0');
mark.addEventListener('focus', function () {
clearTimeout(tooltipTimer);
showTooltip(mark, ann);
});
mark.addEventListener('blur', function () {
if (!tooltipPinned) hideTooltip(false);
});
mark.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
showTooltip(mark, ann);
tooltipPinned = true;
var del = tooltip.querySelector('.ann-tooltip-delete');
if (del) del.focus();
} else if (e.key === 'Escape') {
hideTooltip(true);
}
});
} }
/* ------------------------------------------------------------------ /* ------------------------------------------------------------------

View File

@ -1,86 +0,0 @@
/* citations.js hover tooltip for inline citation markers.
On hover of a .cite-marker, reads the matching bibliography entry from
the DOM and shows it in a floating tooltip. On click, follows the href
to jump to the bibliography section. Phase 3 popups.js can supersede this. */
(function () {
'use strict';
let activeTooltip = null;
let hideTimer = null;
function makeTooltip(html) {
const el = document.createElement('div');
el.className = 'cite-tooltip';
el.innerHTML = html;
el.addEventListener('mouseenter', () => clearTimeout(hideTimer));
el.addEventListener('mouseleave', scheduleHide);
return el;
}
function positionTooltip(tooltip, anchor) {
document.body.appendChild(tooltip);
const aRect = anchor.getBoundingClientRect();
const tRect = tooltip.getBoundingClientRect();
let left = aRect.left + window.scrollX;
let top = aRect.top + window.scrollY - tRect.height - 10;
// Keep horizontally within viewport with margin
const maxLeft = window.innerWidth - tRect.width - 12;
left = Math.max(8, Math.min(left, maxLeft));
// Flip below anchor if not enough room above
if (top < window.scrollY + 8) {
top = aRect.bottom + window.scrollY + 10;
}
tooltip.style.left = left + 'px';
tooltip.style.top = top + 'px';
}
function scheduleHide() {
hideTimer = setTimeout(() => {
if (activeTooltip) {
activeTooltip.remove();
activeTooltip = null;
}
}, 180);
}
function getRefHtml(refEl) {
// Strip the [N] number span, return the remaining innerHTML
const clone = refEl.cloneNode(true);
const num = clone.querySelector('.ref-num');
if (num) num.remove();
return clone.innerHTML.trim();
}
function init() {
document.querySelectorAll('.cite-marker').forEach(marker => {
const link = marker.querySelector('a.cite-link');
if (!link) return;
const href = link.getAttribute('href');
if (!href || !href.startsWith('#')) return;
const refEl = document.getElementById(href.slice(1));
if (!refEl) return;
marker.addEventListener('mouseenter', () => {
clearTimeout(hideTimer);
if (activeTooltip) { activeTooltip.remove(); }
activeTooltip = makeTooltip(getRefHtml(refEl));
positionTooltip(activeTooltip, marker);
});
marker.addEventListener('mouseleave', scheduleHide);
});
}
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', init);
} else {
init();
}
})();

View File

@ -9,9 +9,18 @@
(function () { (function () {
'use strict'; 'use strict';
var PREFIX = 'section-collapsed:'; /* Keys are namespaced by pathname: Pandoc auto-slugs (#introduction,
#background) recur across essays, and an un-namespaced key would
collapse the same-named section on every page. */
var PREFIX = 'section-collapsed:' + location.pathname + ':';
var store = window.lnUtils && window.lnUtils.safeStorage;
function initHeading(heading) { function initHeading(heading) {
// Idempotence guard: reinitCollapse may be called more than once on
// the same container — never re-wrap a section or stack toggle
// buttons (matches the popups.js/sidenotes.js convention).
if (heading.dataset.collapseBound === '1') return;
var level = parseInt(heading.tagName[1], 10); var level = parseInt(heading.tagName[1], 10);
var content = []; var content = [];
var node = heading.nextElementSibling; var node = heading.nextElementSibling;
@ -24,6 +33,7 @@
node = node.nextElementSibling; node = node.nextElementSibling;
} }
if (!content.length) return; if (!content.length) return;
heading.dataset.collapseBound = '1';
// Wrap collected nodes in a .section-body div. // Wrap collected nodes in a .section-body div.
var wrapper = document.createElement('div'); var wrapper = document.createElement('div');
@ -41,7 +51,7 @@
// Restore persisted state without transition flash. // Restore persisted state without transition flash.
var key = PREFIX + heading.id; var key = PREFIX + heading.id;
var collapsed = localStorage.getItem(key) === '1'; var collapsed = store ? store.get(key) === '1' : false;
function setCollapsed(c, animate) { function setCollapsed(c, animate) {
if (!animate) wrapper.style.transition = 'none'; if (!animate) wrapper.style.transition = 'none';
@ -80,7 +90,7 @@
void wrapper.offsetHeight; // force reflow void wrapper.offsetHeight; // force reflow
} }
setCollapsed(!isCollapsed, true); setCollapsed(!isCollapsed, true);
localStorage.setItem(key, isCollapsed ? '0' : '1'); if (store) store.set(key, isCollapsed ? '0' : '1');
}); });
// After open animation: release the height cap so late-rendering // After open animation: release the height cap so late-rendering

View File

@ -17,9 +17,18 @@
btn.setAttribute('aria-label', 'Copy code to clipboard'); btn.setAttribute('aria-label', 'Copy code to clipboard');
btn.addEventListener('click', function () { btn.addEventListener('click', function () {
var text = pre.querySelector('code') var code = pre.querySelector('code');
? pre.querySelector('code').innerText var text;
: pre.innerText; if (code) {
text = code.innerText;
} else {
/* Code-less <pre>: clone and strip the injected button so
its label is not copied along with the content. */
var clone = pre.cloneNode(true);
var cloneBtn = clone.querySelector('.copy-btn');
if (cloneBtn) cloneBtn.remove();
text = clone.innerText;
}
navigator.clipboard.writeText(text).then(function () { navigator.clipboard.writeText(text).then(function () {
btn.textContent = 'copied'; btn.textContent = 'copied';

View File

@ -88,6 +88,21 @@
return exhibit.dataset.exhibitCaption || ''; return exhibit.dataset.exhibitCaption || '';
} }
/* Make an exhibit wrapper keyboard-operable: role=button, tabindex,
and Enter/Space sharing the click path. closeOverlay()'s focus
return relies on the wrapper being focusable. */
function bindActivation(el, activate) {
el.setAttribute('role', 'button');
el.setAttribute('tabindex', '0');
el.addEventListener('click', activate);
el.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
activate();
}
});
}
function discoverFocusableMath(markdownBody) { function discoverFocusableMath(markdownBody) {
markdownBody.querySelectorAll('.katex-display').forEach(function (katexEl) { markdownBody.querySelectorAll('.katex-display').forEach(function (katexEl) {
var source = getSource(katexEl); var source = getSource(katexEl);
@ -118,8 +133,8 @@
}; };
focusables.push(entry); focusables.push(entry);
/* Click anywhere on the wrapper opens the overlay */ /* Click or Enter/Space anywhere on the wrapper opens the overlay */
wrapper.addEventListener('click', function () { bindActivation(wrapper, function () {
openOverlay(focusables.indexOf(entry)); openOverlay(focusables.indexOf(entry));
}); });
}); });
@ -151,7 +166,7 @@
}; };
focusables.push(entry); focusables.push(entry);
figEl.addEventListener('click', function () { bindActivation(figEl, function () {
openOverlay(focusables.indexOf(entry)); openOverlay(focusables.indexOf(entry));
}); });
}); });

View File

@ -165,7 +165,12 @@
var images = document.querySelectorAll('img[data-lightbox]'); var images = document.querySelectorAll('img[data-lightbox]');
images.forEach(function (el) { images.forEach(function (el) {
el.addEventListener('click', function () { // Keyboard activation: the trigger acts as a button, and the
// tabindex also lets close() return focus to it.
el.setAttribute('tabindex', '0');
el.setAttribute('role', 'button');
function activate() {
// Look for a sibling figcaption in the parent figure // Look for a sibling figcaption in the parent figure
var figcaptionText = ''; var figcaptionText = '';
var parent = el.parentElement; var parent = el.parentElement;
@ -176,6 +181,14 @@
} }
} }
open(el.src, el.alt, figcaptionText, el); open(el.src, el.alt, figcaptionText, el);
}
el.addEventListener('click', activate);
el.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
activate();
}
}); });
}); });
@ -199,11 +212,42 @@
setInfoVisible(!overlay.classList.contains('is-info-visible')); setInfoVisible(!overlay.classList.contains('is-info-visible'));
}); });
// Escape closes; "i" toggles info panel (darkroom only). /* Focus trap for the overlay: cycle Tab/Shift+Tab through the
focusable controls inside the lightbox so keyboard users
cannot tab out into the obscured page background. Same
approach as gallery.js's trapTab; the [hidden] exclusion
covers infoBtn, which is hidden outside darkroom mode. */
function trapTab(e) {
var focusable = Array.from(overlay.querySelectorAll(
'button:not([disabled]):not([hidden]), [tabindex]:not([tabindex="-1"])'
));
if (focusable.length === 0) {
e.preventDefault();
return;
}
var first = focusable[0];
var last = focusable[focusable.length - 1];
var active = document.activeElement;
if (e.shiftKey) {
if (active === first || !overlay.contains(active)) {
e.preventDefault();
last.focus();
}
} else {
if (active === last || !overlay.contains(active)) {
e.preventDefault();
first.focus();
}
}
}
// Escape closes; Tab is trapped; "i" toggles info panel (darkroom only).
document.addEventListener('keydown', function (e) { document.addEventListener('keydown', function (e) {
if (!overlay.classList.contains('is-open')) return; if (!overlay.classList.contains('is-open')) return;
if (e.key === 'Escape') { if (e.key === 'Escape') {
close(); close();
} else if (e.key === 'Tab') {
trapTab(e);
} else if ((e.key === 'i' || e.key === 'I') } else if ((e.key === 'i' || e.key === 'I')
&& overlay.classList.contains('darkroom') && overlay.classList.contains('darkroom')
&& !infoBtn.hidden) { && !infoBtn.hidden) {

View File

@ -17,17 +17,23 @@
const toggle = document.querySelector('.nav-portal-toggle'); const toggle = document.querySelector('.nav-portal-toggle');
if (!portals || !toggle) return; if (!portals || !toggle) return;
// safeStorage (utils.js, loaded synchronously before us) so a
// storage-blocked context can't throw before the click listener
// below binds; guarded like theme.js in case utils.js itself
// failed to load.
const store = window.lnUtils && window.lnUtils.safeStorage;
function setOpen(open) { function setOpen(open) {
portals.classList.toggle('is-open', open); portals.classList.toggle('is-open', open);
toggle.setAttribute('aria-expanded', String(open)); toggle.setAttribute('aria-expanded', String(open));
// Rotate arrow indicator if present. // Rotate arrow indicator if present.
const arrow = toggle.querySelector('.nav-portal-arrow'); const arrow = toggle.querySelector('.nav-portal-arrow');
if (arrow) arrow.textContent = open ? '▲' : '▼'; if (arrow) arrow.textContent = open ? '▲' : '▼';
localStorage.setItem(STORAGE_KEY, open ? '1' : '0'); if (store) store.set(STORAGE_KEY, open ? '1' : '0');
} }
// Restore persisted state; default is collapsed. // Restore persisted state; default is collapsed.
const stored = localStorage.getItem(STORAGE_KEY); const stored = store ? store.get(STORAGE_KEY) : null;
setOpen(stored === '1'); setOpen(stored === '1');
toggle.addEventListener('click', function () { toggle.addEventListener('click', function () {

View File

@ -472,7 +472,12 @@
if (!match) return Promise.resolve(null); if (!match) return Promise.resolve(null);
var ctx = { match: match, href: href }; var ctx = { match: match, href: href };
var url = p.url(ctx); /* p.url runs synchronously (before the .catch below attaches) and
can throw e.g. decodeURIComponent on a malformed percent
sequence in the link path. Treat a throw as "no popup". */
var url;
try { url = p.url(ctx); }
catch (e) { return Promise.resolve(null); }
var fetcher = p.fetchType === 'xml' ? fetchXml : fetchJson; var fetcher = p.fetchType === 'xml' ? fetchXml : fetchJson;
return fetcher(url, p.fetchInit).then(function (data) { return fetcher(url, p.fetchInit).then(function (data) {
@ -951,10 +956,10 @@
var agoDays = daysBetween(start, today); var agoDays = daysBetween(start, today);
/* "~" prefix when we've rounded to a unit larger than days. */ /* "~" prefix when we've rounded to a unit larger than days. */
var span = humanDuration(spanDays, true); var span = humanDuration(spanDays, true);
var ago = humanAgo(agoDays); var ago = humanAgo(agoDays); /* '' when start is in the future */
lines.push( lines.push(
'<div class="popup-date-primary">' '<div class="popup-date-primary">'
+ esc(span) + ' · started ' + esc(ago) + esc(span) + (ago ? ' · started ' + esc(ago) : '')
+ '</div>'); + '</div>');
if (commits && /^\d+$/.test(commits)) { if (commits && /^\d+$/.test(commits)) {
var n = parseInt(commits, 10); var n = parseInt(commits, 10);
@ -965,11 +970,17 @@
} }
} else { } else {
var days = daysBetween(start, today); var days = daysBetween(start, today);
lines.push( var ago2 = humanAgo(days); /* '' when the date is in the future */
'<div class="popup-date-primary">' if (ago2) {
+ esc(humanAgo(days)) + '</div>'); lines.push(
'<div class="popup-date-primary">'
+ esc(ago2) + '</div>');
}
} }
/* Nothing renderable (e.g. a lone future date): no popup. */
if (!lines.length) return Promise.resolve(null);
return Promise.resolve('<div class="popup-date">' + lines.join('') + '</div>'); return Promise.resolve('<div class="popup-date">' + lines.join('') + '</div>');
} }
@ -981,9 +992,10 @@
return isNaN(d.getTime()) ? null : d; return isNaN(d.getTime()) ? null : d;
} }
/* Whole-day difference between two Dates, floored (never negative). */ /* Whole-day difference b a, floored. Negative when b precedes a,
so callers can detect future dates instead of mislabelling them. */
function daysBetween(a, b) { function daysBetween(a, b) {
var ms = Math.abs(b.getTime() - a.getTime()); var ms = b.getTime() - a.getTime();
return Math.floor(ms / 86400000); return Math.floor(ms / 86400000);
} }
@ -1005,9 +1017,12 @@
return (approx ? '~' : '') + y + ' year' + (y === 1 ? '' : 's'); return (approx ? '~' : '') + y + ' year' + (y === 1 ? '' : 's');
} }
/* Past-tense phrasing for a date N days in the past. */ /* Past-tense phrasing for a date N days in the past. Returns '' for
future dates (negative N) mirror now.js so callers render
nothing rather than a false "N days ago". */
function humanAgo(days) { function humanAgo(days) {
if (days <= 0) return 'today'; if (days < 0) return ''; /* future / clock skew */
if (days === 0) return 'today';
if (days === 1) return 'yesterday'; if (days === 1) return 'yesterday';
if (days < 14) return days + ' days ago'; if (days < 14) return days + ' days ago';
return humanDuration(days, true) + ' ago'; return humanDuration(days, true) + ' ago';

View File

@ -23,6 +23,9 @@
/* Read ?p= from the query string for deep linking. */ /* Read ?p= from the query string for deep linking. */
var qs = new URLSearchParams(window.location.search); var qs = new URLSearchParams(window.location.search);
/* Keep the canonical URL clean on plain loads: only sync ?p= back to
the URL when one was already present or the user navigates. */
var syncUrl = qs.has('p');
var initPage = parseInt(qs.get('p'), 10); var initPage = parseInt(qs.get('p'), 10);
if (!isNaN(initPage) && initPage >= 1 && initPage <= pageCount) { if (!isNaN(initPage) && initPage >= 1 && initPage <= pageCount) {
currentPage = initPage; currentPage = initPage;
@ -47,7 +50,7 @@
/* Replace URL so the page is bookmarkable at the current position. /* Replace URL so the page is bookmarkable at the current position.
The back button still returns to the landing page. */ The back button still returns to the landing page. */
history.replaceState(null, '', '?p=' + currentPage); if (syncUrl) history.replaceState(null, '', '?p=' + currentPage);
/* Preload the adjacent pages for smooth turning. */ /* Preload the adjacent pages for smooth turning. */
if (currentPage > 1) new Image().src = pages[currentPage - 2]; if (currentPage > 1) new Image().src = pages[currentPage - 2];
@ -132,4 +135,5 @@
------------------------------------------------------------------ */ ------------------------------------------------------------------ */
navigate(currentPage); navigate(currentPage);
syncUrl = true; /* any later navigate() is a user action — sync from here on */
}()); }());

View File

@ -113,12 +113,17 @@
/* ---- URL extraction ---- */ /* ---- URL extraction ---- */
/* Normalise a URL to a pathname for lookup in epistemicMeta. /* Normalise a URL to a pathname for lookup in epistemicMeta.
Pagefind results use full URLs; semantic results use relative paths. */ Pagefind results use full URLs; semantic results use relative paths.
epistemicMeta keys are emitted as routed paths (".../index.html"),
while result links use the clean directory form (".../"), so the
trailing-slash form must be expanded before lookup. */
function normUrl(href) { function normUrl(href) {
if (!href) return null; if (!href) return null;
try { try {
var u = new URL(href, window.location.origin); var u = new URL(href, window.location.origin);
return u.pathname; var p = u.pathname;
if (p.charAt(p.length - 1) === '/') p += 'index.html';
return p;
} catch (e) { } catch (e) {
return href; return href;
} }
@ -268,7 +273,12 @@
if (!el) return; if (!el) return;
el.addEventListener('input', function () { el.addEventListener('input', function () {
var v = el.value.trim(); var v = el.value.trim();
state[field] = v !== '' ? Math.max(0, Math.min(100, parseInt(v, 10) || 0)) : null; var n = parseInt(v, 10);
/* Non-numeric input deactivates the filter (null) rather
than coercing to an always-matching >= 0 threshold. */
state[field] = (v !== '' && !isNaN(n))
? Math.max(0, Math.min(100, n))
: null;
loadMeta().then(applyFilters); loadMeta().then(applyFilters);
}); });
}); });

View File

@ -7,11 +7,18 @@
'use strict'; 'use strict';
window.addEventListener('DOMContentLoaded', function () { window.addEventListener('DOMContentLoaded', function () {
var ui = new PagefindUI({ /* If the Pagefind bundle failed to load (e.g. 404), skip only the
element: '#search', Pagefind setup the rest of this handler must still run. */
showImages: false, var ui = null;
excerptLength: 30, if (typeof PagefindUI === 'undefined') {
}); console.warn('search.js: PagefindUI not loaded — keyword search disabled.');
} else {
ui = new PagefindUI({
element: '#search',
showImages: false,
excerptLength: 30,
});
}
/* Timing instrumentation ------------------------------------------ */ /* Timing instrumentation ------------------------------------------ */
var timingEl = document.getElementById('search-timing'); var timingEl = document.getElementById('search-timing');
@ -46,7 +53,7 @@
/* Pre-fill from URL parameter and trigger the search -------------- */ /* Pre-fill from URL parameter and trigger the search -------------- */
var params = new URLSearchParams(window.location.search); var params = new URLSearchParams(window.location.search);
var q = params.get('q'); var q = params.get('q');
if (q) { if (q && ui) {
startTime = performance.now(); startTime = performance.now();
ui.triggerSearch(q); ui.triggerSearch(q);
} }

View File

@ -88,6 +88,15 @@
} }
function onKeyUp(e) { function onKeyUp(e) {
/* Typing capitals in the annotation picker's note input (or any
other editable field) releases Shift don't re-summon the
toolbar over the UI the user is typing into. */
var t = e.target;
if (t && t.nodeType === Node.ELEMENT_NODE) {
if (popup.contains(t)) return;
if (picker && picker.contains(t)) return;
if (t.isContentEditable || t.closest('input, textarea')) return;
}
if (e.shiftKey || e.key === 'End' || e.key === 'Home') { if (e.shiftKey || e.key === 'End' || e.key === 'Home') {
clearTimeout(showTimer); clearTimeout(showTimer);
showTimer = setTimeout(tryShow, SHOW_DELAY); showTimer = setTimeout(tryShow, SHOW_DELAY);

View File

@ -39,10 +39,17 @@
Index loading fetch once, lazily Index loading fetch once, lazily
------------------------------------------------------------------ */ ------------------------------------------------------------------ */
/* In-flight promise so concurrent first searches share a single
index fetch (mirrors loadModelPromise below). Without this guard,
two rapid keystrokes would each fetch semantic-index.bin and
semantic-meta.json before the first resolves. */
var loadIndexPromise = null;
function loadIndex() { function loadIndex() {
if (indexReady) return Promise.resolve(); if (indexReady) return Promise.resolve();
if (loadIndexPromise) return loadIndexPromise;
return Promise.all([ loadIndexPromise = Promise.all([
fetch('/data/semantic-index.bin').then(function (r) { fetch('/data/semantic-index.bin').then(function (r) {
if (!r.ok) throw new Error('semantic-index.bin not found'); if (!r.ok) throw new Error('semantic-index.bin not found');
return r.arrayBuffer(); return r.arrayBuffer();
@ -54,8 +61,23 @@
]).then(function (results) { ]).then(function (results) {
vectors = new Float32Array(results[0]); vectors = new Float32Array(results[0]);
meta = results[1]; meta = results[1];
/* Consistency check: a stale CDN-cached bin/json pair would
otherwise produce NaN scores and silently garbage ranking. */
if (vectors.length !== meta.length * DIM) {
console.error('semantic-search: index/meta size mismatch ('
+ vectors.length + ' floats vs ' + meta.length + ' × ' + DIM + ')');
vectors = null;
meta = null;
throw new Error('semantic index not available: index/meta size mismatch');
}
indexReady = true; indexReady = true;
}).catch(function (err) {
/* Allow a retry on the next call instead of caching the
failed promise forever. */
loadIndexPromise = null;
throw err;
}); });
return loadIndexPromise;
} }
/* ------------------------------------------------------------------ /* ------------------------------------------------------------------
@ -114,14 +136,23 @@
}); });
} }
/* Generation token: each runSearch call invalidates all still-in-flight
predecessors, so a stale (earlier) query's results can never render
after a newer query's. */
var searchGeneration = 0;
function runSearch(query) { function runSearch(query) {
var gen = ++searchGeneration;
query = query.trim(); query = query.trim();
if (!query) { clearResults(); return; } if (!query) { clearResults(); return; }
setStatus('Searching…'); setStatus('Searching…');
var indexPromise = loadIndex().catch(function (err) { var indexPromise = loadIndex().catch(function (err) {
setStatus('Semantic index not available — run make build first.'); if (gen === searchGeneration) {
setStatus('Semantic index not available — run make build first.');
}
throw err; throw err;
}); });
var modelPromise = loadModel(); var modelPromise = loadModel();
@ -130,12 +161,14 @@
var pipe = results[1]; var pipe = results[1];
return pipe(query, { pooling: 'mean', normalize: true }); return pipe(query, { pooling: 'mean', normalize: true });
}).then(function (output) { }).then(function (output) {
if (gen !== searchGeneration) return; /* superseded by a newer query */
var queryVec = output.data; /* Float32Array, length 384 */ var queryVec = output.data; /* Float32Array, length 384 */
var scores = cosineSims(queryVec); var scores = cosineSims(queryVec);
var hits = topK(scores); var hits = topK(scores);
renderResults(hits); renderResults(hits);
setStatus(hits.length ? '' : 'No results found.'); setStatus(hits.length ? '' : 'No results found.');
}).catch(function (err) { }).catch(function (err) {
if (gen !== searchGeneration) return; /* superseded by a newer query */
if (err.message && err.message.indexOf('not available') === -1) { if (err.message && err.message.indexOf('not available') === -1) {
setStatus('Search error — see console for details.'); setStatus('Search error — see console for details.');
console.error('semantic-search:', err); console.error('semantic-search:', err);

View File

@ -108,11 +108,26 @@
} }
} }
function loadTransclusion(el) { /* Nested transclusion limits: ancestors carries the chain of srcs
* currently being expanded (cycle guard a self-transcluding page
* must not loop), and MAX_DEPTH caps pathological nesting. */
var MAX_DEPTH = 3;
function loadTransclusion(el, depth, ancestors) {
depth = depth || 0;
ancestors = ancestors || [];
var src = el.dataset.src; var src = el.dataset.src;
var section = el.dataset.section || null; var section = el.dataset.section || null;
if (!src) return; if (!src) return;
if (depth >= MAX_DEPTH || ancestors.indexOf(src) !== -1) {
el.classList.add('transclude--error');
el.textContent = '[transclusion omitted (cycle or depth limit): '
+ src + (section ? '#' + section : '') + ']';
return;
}
el.classList.add('transclude--loading'); el.classList.add('transclude--loading');
fetchPage(src) fetchPage(src)
@ -138,6 +153,14 @@
el.classList.replace('transclude--loading', 'transclude--loaded'); el.classList.replace('transclude--loading', 'transclude--loaded');
el.appendChild(wrapper); el.appendChild(wrapper);
/* The fetched page may itself contain transclusion
placeholders process them too, extending the
ancestor chain for cycle/depth guarding. */
var chain = ancestors.concat(src);
wrapper.querySelectorAll('div.transclude').forEach(function (nested) {
loadTransclusion(nested, depth + 1, chain);
});
reinitFragment(el); reinitFragment(el);
}) })
.catch(function (err) { .catch(function (err) {
@ -147,6 +170,8 @@
} }
document.addEventListener('DOMContentLoaded', function () { document.addEventListener('DOMContentLoaded', function () {
document.querySelectorAll('div.transclude').forEach(loadTransclusion); document.querySelectorAll('div.transclude').forEach(function (el) {
loadTransclusion(el);
});
}); });
}()); }());

View File

@ -94,6 +94,9 @@
function isDark() { function isDark() {
var t = document.documentElement.dataset.theme; var t = document.documentElement.dataset.theme;
if (t === 'dark') return true; if (t === 'dark') return true;
/* cappuccino is a dark-brown theme (light text on #553a28) charts
need the dark palette or axis labels become unreadable. */
if (t === 'cappuccino') return true;
if (t === 'light') return false; if (t === 'light') return false;
return window.matchMedia('(prefers-color-scheme: dark)').matches; return window.matchMedia('(prefers-color-scheme: dark)').matches;
} }

7
static/logo-sprite.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 32 KiB

BIN
static/og-image.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

View File

@ -1,13 +1,28 @@
{ {
"name": "levineuwirth.org", "name": "Levi Neuwirth",
"short_name": "ln", "short_name": "ln",
"description": "Personal site of Levi Neuwirth — essays, research, music, and photography.",
"start_url": "/",
"scope": "/",
"icons": [ "icons": [
{
"src": "/web-app-manifest-192x192.png",
"sizes": "192x192",
"type": "image/png",
"purpose": "any"
},
{ {
"src": "/web-app-manifest-192x192.png", "src": "/web-app-manifest-192x192.png",
"sizes": "192x192", "sizes": "192x192",
"type": "image/png", "type": "image/png",
"purpose": "maskable" "purpose": "maskable"
}, },
{
"src": "/web-app-manifest-512x512.png",
"sizes": "512x512",
"type": "image/png",
"purpose": "any"
},
{ {
"src": "/web-app-manifest-512x512.png", "src": "/web-app-manifest-512x512.png",
"sizes": "512x512", "sizes": "512x512",
@ -15,7 +30,7 @@
"purpose": "maskable" "purpose": "maskable"
} }
], ],
"theme_color": "#ffffff", "theme_color": "#16140f",
"background_color": "#ffffff", "background_color": "#16140f",
"display": "standalone" "display": "standalone"
} }

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.8 KiB

After

Width:  |  Height:  |  Size: 106 KiB

View File

@ -17,25 +17,28 @@
$body$ $body$
$if(backlinks)$ $if(backlinks)$
<footer class="page-meta-footer"> <footer class="page-meta-footer">
$else$
$if(similar-links)$
<footer class="page-meta-footer">
$endif$
$endif$
$if(backlinks)$
<div class="meta-footer-full meta-footer-backlinks" id="backlinks"> <div class="meta-footer-full meta-footer-backlinks" id="backlinks">
<h3>Backlinks</h3> <h3>Backlinks</h3>
$backlinks$ $backlinks$
</div> </div>
$if(similar-links)$ $endif$
$if(similar-links)$
<div class="meta-footer-full meta-footer-similar" id="similar-links"> <div class="meta-footer-full meta-footer-similar" id="similar-links">
<h3>Related</h3> <h3>Related</h3>
$similar-links$ $similar-links$
</div> </div>
$endif$ $endif$
$if(backlinks)$
</footer> </footer>
$else$ $else$
$if(similar-links)$ $if(similar-links)$
<footer class="page-meta-footer"> </footer>
<div class="meta-footer-full meta-footer-similar" id="similar-links"> $endif$
<h3>Related</h3>
$similar-links$
</div>
</footer>
$endif$
$endif$ $endif$
</main> </main>

View File

@ -14,8 +14,10 @@ $if(home)$<meta property="og:title" content="Levi Neuwirth">$else$$if(title)$<me
$if(description)$<meta property="og:description" content="$description$">$endif$ $if(description)$<meta property="og:description" content="$description$">$endif$
<meta property="og:url" content="$site-url$$url$"> <meta property="og:url" content="$site-url$$url$">
$if(date)$<meta property="og:type" content="article">$else$<meta property="og:type" content="website">$endif$ $if(date)$<meta property="og:type" content="article">$else$<meta property="og:type" content="website">$endif$
<meta property="og:image" content="$site-url$/web-app-manifest-512x512.png"> <meta property="og:image" content="$site-url$/og-image.png">
<meta name="twitter:card" content="summary"> <meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<meta name="twitter:card" content="summary_large_image">
$if(description)$<meta name="twitter:description" content="$description$">$endif$ $if(description)$<meta name="twitter:description" content="$description$">$endif$
<link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96"> <link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">

View File

@ -2,7 +2,13 @@
<nav class="site-nav"> <nav class="site-nav">
<!-- Row 1: primary links --> <!-- Row 1: primary links -->
<div class="nav-row-primary"> <div class="nav-row-primary">
<a href="/" class="nav-logo" aria-label="Home"></a> <!-- The mark lives in /logo-sprite.svg and is referenced via
<use> instead of being inlined: the traced path is ~33 KB,
and a per-page inline copy would dwarf most documents. CSS
custom properties (--logo-ink/--logo-bg) cascade into the
use-element shadow tree, so the two-tone cutout still
renders. -->
<a href="/" class="nav-logo" aria-label="Home"><svg class="nav-logo__mark" aria-hidden="true" focusable="false"><use href="/logo-sprite.svg#logo-mark"/></svg></a>
<div class="nav-primary"> <div class="nav-primary">
<a href="/">Home</a> <a href="/">Home</a>
<a href="/current.html">Current</a> <a href="/current.html">Current</a>

View File

@ -7,6 +7,9 @@
<link rel="stylesheet" href="/css/base.css"> <link rel="stylesheet" href="/css/base.css">
<link rel="stylesheet" href="/css/components.css"> <link rel="stylesheet" href="/css/components.css">
<link rel="stylesheet" href="/css/score-reader.css"> <link rel="stylesheet" href="/css/score-reader.css">
<!-- utils.js must precede theme.js: theme.js reads saved settings via
window.lnUtils.safeStorage and silently restores nothing without it. -->
<script src="/js/utils.js"></script>
<script src="/js/theme.js"></script> <script src="/js/theme.js"></script>
</head> </head>
<body class="score-reader-page"> <body class="score-reader-page">

View File

@ -49,6 +49,10 @@ EOF
bold "── new popup provider ──" bold "── new popup provider ──"
NAME=$(prompt "slug (lowercase, used as class + data-popup-source key, e.g. 'zenodo'):") NAME=$(prompt "slug (lowercase, used as class + data-popup-source key, e.g. 'zenodo'):")
[[ -z "$NAME" ]] && { warn "slug required"; exit 1; } [[ -z "$NAME" ]] && { warn "slug required"; exit 1; }
# The slug is interpolated into nginx directives (location /proxy/$NAME/,
# set \$upstream_$NAME) — validate like import-photo.sh does so a space,
# ';', or '{' can't produce a config that fails to load.
[[ "$NAME" =~ ^[a-z0-9-]+$ ]] || { warn "slug must match ^[a-z0-9-]+\$"; exit 1; }
LABEL=$(prompt "display label (e.g. 'Zenodo'):") LABEL=$(prompt "display label (e.g. 'Zenodo'):")
[[ -z "$LABEL" ]] && LABEL="$NAME" [[ -z "$LABEL" ]] && LABEL="$NAME"
@ -107,14 +111,16 @@ fi
# ── proxy prefix + upstream host derivation ────────────────────────── # ── proxy prefix + upstream host derivation ──────────────────────────
# UPSTREAM_HOST is derived unconditionally: the no-proxy (direct CORS
# fetch) case is exactly when the host must be added to connect-src, so
# the checklist's CSP reminder below needs it populated either way.
UPSTREAM_HOST=$(printf '%s' "$API_URL" | awk -F/ '{print $3}')
if [[ "$NEEDS_PROXY" -eq 1 ]]; then if [[ "$NEEDS_PROXY" -eq 1 ]]; then
UPSTREAM_HOST=$(printf '%s' "$API_URL" | awk -F/ '{print $3}')
UPSTREAM_PATH=$(printf '%s' "$API_URL" | awk -F/ 'BEGIN{OFS="/"} {$1=""; $2=""; $3=""; print}' | sed 's|^///||') UPSTREAM_PATH=$(printf '%s' "$API_URL" | awk -F/ 'BEGIN{OFS="/"} {$1=""; $2=""; $3=""; print}' | sed 's|^///||')
PROXY_PATH="/proxy/$NAME/" PROXY_PATH="/proxy/$NAME/"
PROXY_API_URL="$PROXY_PATH${UPSTREAM_PATH%%\?*}" PROXY_API_URL="$PROXY_PATH${UPSTREAM_PATH%%\?*}"
[[ "$API_URL" == *"?"* ]] && PROXY_API_URL="$PROXY_API_URL?${API_URL#*\?}" [[ "$API_URL" == *"?"* ]] && PROXY_API_URL="$PROXY_API_URL?${API_URL#*\?}"
else else
UPSTREAM_HOST=""
PROXY_API_URL="$API_URL" PROXY_API_URL="$API_URL"
fi fi
@ -205,8 +211,9 @@ cat <<EOF
EOF EOF
if [[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]; then if [[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]; then
echo " 5. In static/js/popups.js top-comment: add $UPSTREAM_HOST to the" echo " 5. Add https://$UPSTREAM_HOST to connect-src in"
echo " connect-src CSP list." echo " nginx/security-headers.conf (direct CORS fetches are blocked"
echo " by CSP otherwise), and mirror it in the popups.js top-comment."
fi fi
echo echo

View File

@ -104,6 +104,30 @@ def err(msg: str) -> None:
print(f"[archive] ERROR: {msg}", file=sys.stderr) print(f"[archive] ERROR: {msg}", file=sys.stderr)
def atomic_write_text(path: Path, text: str) -> None:
"""Write to a PID-unique temp then os.replace. PROVENANCE.json and
the generated index/state files are integrity records an interrupt
mid-write must never leave a truncated file that the next run parses
(or mistakes for corruption); fsync makes the rename durable and the
PID suffix keeps concurrent runs from sharing a temp file."""
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
f.write(text)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def atomic_write_json(path: Path, obj) -> None:
atomic_write_text(
path, json.dumps(obj, indent=2, ensure_ascii=False) + "\n")
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Manifest / removed.yaml # Manifest / removed.yaml
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -119,6 +143,15 @@ def load_yaml_list(path: Path) -> list[dict]:
if not isinstance(data, list): if not isinstance(data, list):
err(f"{path.name}: expected a YAML list, got {type(data).__name__}") err(f"{path.name}: expected a YAML list, got {type(data).__name__}")
sys.exit(1) sys.exit(1)
# Validate items too: a stray scalar line (`- https://example.com`
# instead of `- url: ...`) would otherwise surface much later as an
# AttributeError deep inside fetch/wayback/check.
for i, item in enumerate(data):
if not isinstance(item, dict):
err(f"{path.name}: entry {i + 1} is not a mapping "
f"(got {type(item).__name__}: {item!r}); "
f"each entry must be `- url: ...`")
sys.exit(1)
return data return data
@ -241,7 +274,10 @@ def extract_text_pdf(pdf: Path, txt: Path) -> None:
"""Extract plain text from `pdf` into `txt` via pdftotext. On any """Extract plain text from `pdf` into `txt` via pdftotext. On any
failure an empty file is written so downstream steps still find it.""" failure an empty file is written so downstream steps still find it."""
try: try:
subprocess.run(["pdftotext", "-q", str(pdf), str(txt)], check=True) # `--` ends option parsing so a slug starting with `-` cannot be
# mistaken for a pdftotext option.
subprocess.run(["pdftotext", "-q", "--", str(pdf), str(txt)],
check=True)
except (subprocess.CalledProcessError, FileNotFoundError) as exc: except (subprocess.CalledProcessError, FileNotFoundError) as exc:
err(f"{pdf.name}: pdftotext failed ({exc}); writing empty text sidecar") err(f"{pdf.name}: pdftotext failed ({exc}); writing empty text sidecar")
txt.write_text("", encoding="utf-8") txt.write_text("", encoding="utf-8")
@ -263,6 +299,51 @@ def find_monolith() -> str | None:
return shutil.which("monolith") return shutil.which("monolith")
MONOLITH_VERSION_FILE = REPO_ROOT / "tools" / "monolith-version.txt"
# Binaries already verified this run — the pin check hashes the binary
# once, not once per snapshot.
_monolith_verified: set[str] = set()
def _pinned_monolith_sha256() -> str | None:
"""Parse the `sha256 = <hex>` line from tools/monolith-version.txt.
Returns None when the file is missing or unparseable (the caller
warns and continues only a *mismatch* is fatal)."""
try:
text = MONOLITH_VERSION_FILE.read_text(encoding="utf-8")
except OSError:
return None
m = re.search(r"^\s*sha256\s*=\s*([0-9a-fA-F]{64})\s*$",
text, re.MULTILINE)
return m.group(1).lower() if m else None
def verify_monolith(mono: str) -> None:
"""Integrity gate for the snapshot tool itself: the binary that
produces committed artifacts must match the SHA-256 pinned in
tools/monolith-version.txt. A mismatch is an integrity error (print
loudly, exit non-zero, halt `make build`); a missing or unparseable
version file is a warning only."""
if mono in _monolith_verified:
return
pinned = _pinned_monolith_sha256()
if pinned is None:
print(f"[archive] WARNING: {MONOLITH_VERSION_FILE.name} is missing "
f"or has no parseable `sha256 = …` line — monolith binary "
f"integrity NOT verified ({mono})", file=sys.stderr)
_monolith_verified.add(mono)
return
live = sha256_of(Path(mono))
if live != pinned:
err(f"monolith binary {mono} fails SHA-256 verification "
f"(pinned {pinned}, found {live}). The snapshot tool's bytes "
f"do not match tools/monolith-version.txt — re-vendor the "
f"binary or update the pin (see that file's instructions).")
sys.exit(1)
_monolith_verified.add(mono)
def body_noarchive(path: Path) -> bool: def body_noarchive(path: Path) -> bool:
"""True if the snapshot declares <meta name=robots ... noarchive> — """True if the snapshot declares <meta name=robots ... noarchive> —
the in-document equivalent of the X-Robots-Tag header.""" the in-document equivalent of the X-Robots-Tag header."""
@ -327,6 +408,7 @@ def fetch_html(url: str, dest: Path) -> bool:
f"tools/bin/monolith (see tools/monolith-version.txt) or set " f"tools/bin/monolith (see tools/monolith-version.txt) or set "
f"$MONOLITH_BIN; HTML snapshot skipped") f"$MONOLITH_BIN; HTML snapshot skipped")
return False return False
verify_monolith(mono)
source = dest.with_suffix(dest.suffix + ".source.part") source = dest.with_suffix(dest.suffix + ".source.part")
tmp = dest.with_suffix(dest.suffix + ".part") tmp = dest.with_suffix(dest.suffix + ".part")
@ -715,10 +797,7 @@ def cmd_fetch() -> int:
"snapshot-quality": quality, "snapshot-quality": quality,
"wayback": None, "wayback": None,
} }
prov_path.write_text( atomic_write_json(prov_path, prov)
json.dumps(prov, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
log(f"{slug}: archived [{atype}, {quality}] ({prov['bytes']} bytes)") log(f"{slug}: archived [{atype}, {quality}] ({prov['bytes']} bytes)")
# --- contribute to the Hakyll index ------------------------------- # --- contribute to the Hakyll index -------------------------------
@ -730,11 +809,7 @@ def cmd_fetch() -> int:
} }
# archive-index.json is always rewritten to mirror the manifest exactly. # archive-index.json is always rewritten to mirror the manifest exactly.
INDEX_OUT.parent.mkdir(parents=True, exist_ok=True) atomic_write_json(INDEX_OUT, index)
INDEX_OUT.write_text(
json.dumps(index, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
log(f"wrote {INDEX_OUT.relative_to(REPO_ROOT)} ({len(index)} entries)") log(f"wrote {INDEX_OUT.relative_to(REPO_ROOT)} ({len(index)} entries)")
if skipped: if skipped:
@ -785,14 +860,18 @@ def cmd_refresh(argv: list[str]) -> int:
try: try:
prev = json.loads(prov_path.read_text(encoding="utf-8")) prev = json.loads(prov_path.read_text(encoding="utf-8"))
prev_sha = prev.get("sha256") prev_sha = prev.get("sha256")
prev_artifact = slug_dir / prev.get("artifact", "") prev_art_name = prev.get("artifact") or ""
prev_artifact = slug_dir / prev_art_name
except Exception as exc: # noqa: BLE001 except Exception as exc: # noqa: BLE001
err(f"refresh: cannot parse prior provenance for {slug}: {exc}") err(f"refresh: cannot parse prior provenance for {slug}: {exc}")
return 2 return 2
# The prior snapshot must be committed and clean — otherwise # The prior snapshot must be committed and clean — otherwise
# `previous-sha256` would point at bytes git can no longer give # `previous-sha256` would point at bytes git can no longer give
# back, breaking the auditable replacement contract. # back, breaking the auditable replacement contract. The empty-
if not prev_sha or not prev_artifact.exists(): # artifact guard matters: without it prev_artifact would be the
# slug directory itself, which exists() accepts and sha256_of
# then crashes on with IsADirectoryError.
if not prev_sha or not prev_art_name or not prev_artifact.is_file():
err(f"refresh: prior snapshot for {slug} is incomplete; restore " err(f"refresh: prior snapshot for {slug} is incomplete; restore "
f"its artifact and provenance before replacing it.") f"its artifact and provenance before replacing it.")
return 2 return 2
@ -850,11 +929,7 @@ def cmd_refresh(argv: list[str]) -> int:
if art_name and (slug_dir / art_name).exists(): if art_name and (slug_dir / art_name).exists():
if prev_sha: if prev_sha:
new_prov["previous-sha256"] = prev_sha new_prov["previous-sha256"] = prev_sha
prov_path.write_text( atomic_write_json(prov_path, new_prov)
json.dumps(new_prov, indent=2,
ensure_ascii=False) + "\n",
encoding="utf-8",
)
log(f"refresh: recorded previous-sha256 " log(f"refresh: recorded previous-sha256 "
f"{prev_sha[:12]}") f"{prev_sha[:12]}")
succeeded = True succeeded = True
@ -893,8 +968,12 @@ def wayback_save(url: str) -> None:
"""Trigger a fresh Wayback capture via Save Page Now. Best-effort: any """Trigger a fresh Wayback capture via Save Page Now. Best-effort: any
outcome is tolerated the resulting URL is read back via the outcome is tolerated the resulting URL is read back via the
availability API (which also surfaces a pre-existing capture).""" availability API (which also surfaces a pre-existing capture)."""
req = urllib.request.Request("https://web.archive.org/save/" + url, # Quote only what can't appear raw in a request line (spaces,
headers={"User-Agent": USER_AGENT}) # control chars); URL structure (:/?&=#) passes through so Save
# Page Now sees the original URL shape.
req = urllib.request.Request(
"https://web.archive.org/save/" + quote(url, safe=":/?&=#"),
headers={"User-Agent": USER_AGENT})
try: try:
with urllib.request.urlopen(req, timeout=WAYBACK_TIMEOUT): with urllib.request.urlopen(req, timeout=WAYBACK_TIMEOUT):
pass pass
@ -951,10 +1030,7 @@ def cmd_wayback() -> int:
capture = wayback_lookup(url) capture = wayback_lookup(url)
if capture: if capture:
prov["wayback"] = capture prov["wayback"] = capture
prov_path.write_text( atomic_write_json(prov_path, prov)
json.dumps(prov, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
log(f"{slug}: wayback -> {capture}") log(f"{slug}: wayback -> {capture}")
backfilled += 1 backfilled += 1
else: else:
@ -1073,11 +1149,7 @@ def cmd_check() -> int:
note = f" -> {new_url}" if new_url else "" note = f" -> {new_url}" if new_url else ""
log(f"check: {url} [{rec['status']}]{note}") log(f"check: {url} [{rec['status']}]{note}")
STATE_OUT.parent.mkdir(parents=True, exist_ok=True) atomic_write_json(STATE_OUT, state)
STATE_OUT.write_text(
json.dumps(state, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
log(f"check: {tally['live']} live, {tally['moved']} moved, " log(f"check: {tally['live']} live, {tally['moved']} moved, "
f"{tally['error']} error, {tally['rotted']} rotted " f"{tally['error']} error, {tally['rotted']} rotted "
f"-> {STATE_OUT.relative_to(REPO_ROOT)}") f"-> {STATE_OUT.relative_to(REPO_ROOT)}")

View File

@ -32,7 +32,11 @@ while IFS= read -r -d '' img; do
skipped=$((skipped + 1)) skipped=$((skipped + 1))
else else
echo " webp ${img#"$REPO_ROOT/"}" echo " webp ${img#"$REPO_ROOT/"}"
cwebp -quiet -q 85 "$img" -o "$webp" # Write to a temp name then move: an interrupted cwebp would
# otherwise leave a truncated .webp that is newer than its
# source, which the staleness gate above then skips forever.
cwebp -quiet -q 85 "$img" -o "$webp.part"
mv "$webp.part" "$webp"
converted=$((converted + 1)) converted=$((converted + 1))
fi fi
done < <(find "$REPO_ROOT/static" "$REPO_ROOT/content" \ done < <(find "$REPO_ROOT/static" "$REPO_ROOT/content" \

View File

@ -7,8 +7,9 @@
# the site, no third-party request at view time. # the site, no third-party request at view time.
# #
# Run once before deploying. The vendored copy is gitignored # Run once before deploying. The vendored copy is gitignored
# (~150 KB total); re-running is safe — the script skips when the # (~150 KB total); re-running is safe — files that already exist AND
# files already exist. # match their pinned checksum are skipped; anything missing or
# mismatched is re-fetched.
# #
# To bump the pinned versions, set LEAFLET_VERSION / MARKERCLUSTER_VERSION, # To bump the pinned versions, set LEAFLET_VERSION / MARKERCLUSTER_VERSION,
# re-run, then update tools/leaflet-checksums.sha256 with the new hashes. # re-run, then update tools/leaflet-checksums.sha256 with the new hashes.
@ -39,13 +40,6 @@ files_to_fetch=(
"$UNPKG_MC|MarkerCluster.Default.css|leaflet.markercluster-${MARKERCLUSTER_VERSION}-MarkerCluster.Default.css" "$UNPKG_MC|MarkerCluster.Default.css|leaflet.markercluster-${MARKERCLUSTER_VERSION}-MarkerCluster.Default.css"
) )
# Skip the whole step if the canonical entry-point already exists.
# Force a re-fetch by removing the directory.
if [ -f "$LEAFLET_DIR/leaflet.js" ] && [ -f "$LEAFLET_DIR/leaflet.markercluster.js" ]; then
echo "leaflet: already vendored at $LEAFLET_DIR (skipping)"
exit 0
fi
mkdir -p "$LEAFLET_DIR/images" mkdir -p "$LEAFLET_DIR/images"
verify_or_warn() { verify_or_warn() {
@ -71,15 +65,35 @@ verify_or_warn() {
fi fi
} }
# Per-file skip: existing files are skipped only after re-verifying
# their checksum, so a partial or tampered file from an interrupted
# earlier run can never be silently accepted. Downloads land in a
# .part temp and are only moved into place after verification — a
# failed verification leaves nothing at the final path.
for entry in "${files_to_fetch[@]}"; do for entry in "${files_to_fetch[@]}"; do
IFS='|' read -r url_base local_path pin_key <<<"$entry" IFS='|' read -r url_base local_path pin_key <<<"$entry"
src_name="${local_path##*/}" src_name="${local_path##*/}"
target="$LEAFLET_DIR/$local_path" target="$LEAFLET_DIR/$local_path"
mkdir -p "$(dirname "$target")" mkdir -p "$(dirname "$target")"
if [ -f "$target" ]; then
if verify_or_warn "$target" "$pin_key"; then
echo "leaflet: $local_path present and verified (skipping)"
continue
fi
echo "leaflet: $local_path failed verification — re-fetching" >&2
rm -f "$target"
fi
echo "leaflet: fetching $local_path ($pin_key)" echo "leaflet: fetching $local_path ($pin_key)"
curl -fsSL --progress-bar "$url_base/$src_name" -o "$target" tmp="$target.part"
verify_or_warn "$target" "$pin_key" curl -fsSL --progress-bar "$url_base/$src_name" -o "$tmp"
if ! verify_or_warn "$tmp" "$pin_key"; then
rm -f "$tmp"
echo "leaflet: refusing to vendor unverified $local_path" >&2
exit 1
fi
mv "$tmp" "$target"
done done
echo "leaflet: vendored to $LEAFLET_DIR" echo "leaflet: vendored to $LEAFLET_DIR"

View File

@ -68,8 +68,13 @@ fetch() {
return return
fi fi
echo " fetch $src" echo " fetch $src"
curl -fsSL --progress-bar "$BASE_URL/$src" -o "$dst" # Download to a temp name and move into place only after
verify_sha "$src" "$dst" # verification: an interrupted curl must never leave a partial
# file at the final path, where the present-file skip (or, for an
# unpinned file, nothing at all) would accept it forever.
curl -fsSL --progress-bar "$BASE_URL/$src" -o "$dst.part"
verify_sha "$src" "$dst.part"
mv "$dst.part" "$dst"
} }
if [ ! -f "$CHECKSUMS" ]; then if [ ! -f "$CHECKSUMS" ]; then

View File

@ -5,20 +5,36 @@ embed.py — Build-time embedding pipeline.
Produces two outputs from _site/**/*.html: Produces two outputs from _site/**/*.html:
data/similar-links.json Page-level similarity (for "Related" footer section) data/similar-links.json Page-level similarity (for "Related" footer section)
data/semantic-index.bin Paragraph vectors as raw Float32 array (N × DIM) data/semantic-index.bin Paragraph vectors as raw Float32 array (N × PARA_DIM)
data/semantic-meta.json Paragraph metadata: [{url, title, heading, excerpt}] data/semantic-meta.json Paragraph metadata: [{url, title, heading, excerpt}]
Both use all-MiniLM-L6-v2 (384 dims) the same model shipped to the browser Two models, one process:
via transformers.js for query-time semantic search.
* Pages use nomic-embed-text-v1.5 (768 dims) build-time only, never
shipped to the browser. Chosen for its well-separated cosine scores on
small corpora, which keeps the MIN_SCORE gate meaningful so every essay
reliably gets a "Related" footer section.
* Paragraphs use all-MiniLM-L6-v2 (384 dims) must match what the
browser runs via transformers.js (static/js/semantic-search.js) since
query vectors are dotted against the shipped index.
Called by `make build` when .venv exists. Failures are non-fatal. Called by `make build` when .venv exists. Failures are non-fatal.
Staleness check: skips if all output files are newer than every HTML in _site/.
Staleness: both passes are content-hash cached (data/embed-cache-*.npz),
so an unchanged site re-embeds nothing and loads no model only the
HTML extraction pass runs. There is deliberately no mtime-based skip:
stamp-build-time.py rewrites every page's footer after this script runs,
so "are outputs newer than the HTML" is always false and a check based
on it can never fire.
""" """
import hashlib
import json import json
import os import os
import re import re
import sys import sys
import zipfile
from pathlib import Path from pathlib import Path
import faiss import faiss
@ -35,13 +51,48 @@ SITE_DIR = REPO_ROOT / "_site"
SIMILAR_OUT = REPO_ROOT / "data" / "similar-links.json" SIMILAR_OUT = REPO_ROOT / "data" / "similar-links.json"
SEMANTIC_BIN = REPO_ROOT / "data" / "semantic-index.bin" SEMANTIC_BIN = REPO_ROOT / "data" / "semantic-index.bin"
SEMANTIC_META = REPO_ROOT / "data" / "semantic-meta.json" SEMANTIC_META = REPO_ROOT / "data" / "semantic-meta.json"
# Content-addressed caches, one per pass. Keyed by sha256 of the (prefixed)
# input text; invalidated wholesale on model name/revision/dim change.
# Gitignored — build artifacts, not source. Survive `make clean`.
PAGE_CACHE = REPO_ROOT / "data" / "embed-cache-pages.npz"
PARA_CACHE = REPO_ROOT / "data" / "embed-cache-paragraphs.npz"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2" # Two models, deliberately split:
# Pinned to a specific HuggingFace commit so a future model bump can't #
# silently change embedding semantics. Bump deliberately when validating # PARA_MODEL — embeds paragraphs for data/semantic-index.bin. This index
# (and re-run a full embed pass to refresh data/semantic-* + similar-links). # is fetched by the browser at /search/ and ranked against query vectors
MODEL_REVISION = "c9745ed1d9f207416be6d2e6f8de32d1f16199bf" # computed client-side. The client (static/js/semantic-search.js) embeds
DIM = 384 # queries with MiniLM-L6-v2 via transformers.js, so the build-time model
# must match exactly — both the architecture and the embedding dimension
# are part of the wire contract.
#
# PAGE_MODEL — embeds full pages for data/similar-links.json. This file
# is consumed only at Hakyll-build time (SimilarLinks.hs) and never
# shipped to the browser, so it is free to use a different, stronger
# model. nomic-embed-text-v1.5 produces well-separated cosine scores on
# small corpora (top neighbours at 0.70.9 instead of MiniLM's compressed
# 0.10.3), so the MIN_SCORE gate below is meaningful and every essay
# reliably gets a "Related" footer section.
#
# Both pins are deliberate. Bump only when validating and re-run a full
# embed pass to refresh the corresponding output files.
PARA_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
PARA_MODEL_REVISION = "c9745ed1d9f207416be6d2e6f8de32d1f16199bf"
PARA_DIM = 384
PAGE_MODEL_NAME = "nomic-ai/nomic-embed-text-v1.5"
PAGE_MODEL_REVISION = "e9b6763023c676ca8431644204f50c2b100d9aab"
# The weights repo above declares its modeling code via auto_map in a
# SEPARATE repo (nomic-ai/nomic-bert-2048), which `revision=` does NOT
# pin — without this second pin, trust_remote_code executes whatever is
# at that repo's head at build time.
PAGE_MODEL_CODE_REVISION = "7710840340a098cfb869c4f65e87cf2b1b70caca"
PAGE_DIM = 768
# Nomic requires task-prefixed input. Documents (corpus side) get
# "search_document: "; queries would get "search_query: ". similar-links
# only ever embeds documents, so the prefix is constant here.
PAGE_PREFIX = "search_document: "
TOP_N = 5 # similar-links: neighbours per page TOP_N = 5 # similar-links: neighbours per page
MIN_SCORE = 0.30 # similar-links: discard weak matches MIN_SCORE = 0.30 # similar-links: discard weak matches
@ -69,33 +120,111 @@ PORTAL_BODY_ATTR = "data-portal"
def atomic_write_bytes(path: Path, data: bytes) -> None: def atomic_write_bytes(path: Path, data: bytes) -> None:
"""Write to path.tmp then os.replace, so an interrupt mid-write """Write to a PID-unique temp then os.replace: an interrupt mid-write
cannot leave a truncated file that the next build/serve loads.""" cannot leave a truncated file at the final path, fsync makes the
rename durable across power loss, and the PID suffix keeps two
concurrent runs from interleaving writes into one temp file."""
path.parent.mkdir(parents=True, exist_ok=True) path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp") tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
tmp.write_bytes(data) try:
os.replace(tmp, path) with tmp.open("wb") as f:
f.write(data)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def atomic_write_text(path: Path, text: str) -> None: def atomic_write_text(path: Path, text: str) -> None:
atomic_write_bytes(path, text.encode("utf-8")) atomic_write_bytes(path, text.encode("utf-8"))
# ---------------------------------------------------------------------------
# Page-embedding cache
# ---------------------------------------------------------------------------
#
# Loading the nomic model and embedding 26 pages on CPU takes ~3 minutes
# every `make build`. Pages rarely change between builds — usually one
# essay is edited and everything else is identical. This cache stores
# one nomic vector per page content hash so unchanged pages are reused
# verbatim and only edited/new pages are re-embedded. A fully-warm cache
# skips the model load entirely.
def content_hash(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
def load_vec_cache(path: Path, model: str, revision: str,
dim: int) -> dict[str, np.ndarray]:
"""Load {hash: vector} from disk. Returns an empty dict if the cache
is absent, unreadable, or pinned to a different model in those
cases save_vec_cache() will overwrite the stale file on next save."""
if not path.exists():
return {}
try:
npz = np.load(path, allow_pickle=False)
if (npz["model"].item() != model or
npz["revision"].item() != revision or
int(npz["dim"].item()) != dim):
return {}
hashes = npz["hashes"]
vectors = npz["vectors"]
if vectors.shape != (len(hashes), dim):
return {}
return {h.item(): vectors[i] for i, h in enumerate(hashes)}
except (OSError, KeyError, ValueError, EOFError,
zipfile.BadZipFile) as e:
print(f"embed.py: cache {path.name} unreadable ({e}) — discarding",
file=sys.stderr)
return {}
def save_vec_cache(path: Path, model: str, revision: str, dim: int,
cache: dict[str, np.ndarray]) -> None:
"""Atomically persist {hash: vector}. Empty cache writes an empty
file so a subsequent load returns {} cleanly (instead of falling
through to the "no file" path)."""
if cache:
hashes = np.array(list(cache.keys()))
vectors = np.stack(list(cache.values())).astype(np.float32)
else:
hashes = np.array([], dtype="U64")
vectors = np.zeros((0, dim), dtype=np.float32)
path.parent.mkdir(parents=True, exist_ok=True)
# Pass an open file handle, not a path: np.savez_compressed appends
# ".npz" to bare paths, which would mangle our atomic-rename target.
# PID-unique temp so concurrent runs can't interleave; fsync so the
# rename is durable.
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with open(tmp, "wb") as f:
np.savez_compressed(
f,
model=model,
revision=revision,
dim=dim,
hashes=hashes,
vectors=vectors,
)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
STRIP_SELECTORS = [ STRIP_SELECTORS = [
"nav", "footer", "#toc", ".link-popup", "script", "style", "nav", "footer", "#toc", ".link-popup", "script", "style",
".page-meta-footer", ".metadata", "[data-pagefind-ignore]", ".page-meta-footer", ".metadata", "[data-pagefind-ignore]",
# The no-JS footnotes fallback duplicates each sidenote's text
# verbatim at the document end — indexing it would double every
# footnote in search results and skew page similarity.
"section.footnotes",
] ]
# ---------------------------------------------------------------------------
# Staleness check
# ---------------------------------------------------------------------------
def needs_update() -> bool:
outputs = [SIMILAR_OUT, SEMANTIC_BIN, SEMANTIC_META]
if not all(p.exists() for p in outputs):
return True
oldest = min(p.stat().st_mtime for p in outputs)
return any(html.stat().st_mtime > oldest for html in SITE_DIR.rglob("*.html"))
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# HTML parsing helpers # HTML parsing helpers
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -191,10 +320,6 @@ def main() -> int:
print("embed.py: _site/ not found — skipping", file=sys.stderr) print("embed.py: _site/ not found — skipping", file=sys.stderr)
return 0 return 0
if not needs_update():
print("embed.py: all outputs up to date — skipping")
return 0
# --- Extract pages + paragraphs in one pass --- # --- Extract pages + paragraphs in one pass ---
print("embed.py: extracting pages…") print("embed.py: extracting pages…")
pages = [] pages = []
@ -211,18 +336,44 @@ def main() -> int:
print("embed.py: no indexable pages found", file=sys.stderr) print("embed.py: no indexable pages found", file=sys.stderr)
return 0 return 0
# --- Load model once for both tasks --- # --- Similar-links (page level, nomic, content-hash cached) ---
print(f"embed.py: loading {MODEL_NAME}@{MODEL_REVISION[:8]}") cache = load_vec_cache(PAGE_CACHE, PAGE_MODEL_NAME,
model = SentenceTransformer(MODEL_NAME, revision=MODEL_REVISION) PAGE_MODEL_REVISION, PAGE_DIM)
page_inputs = [PAGE_PREFIX + p["text"] for p in pages]
hashes = [content_hash(t) for t in page_inputs]
miss_idxs = [i for i, h in enumerate(hashes) if h not in cache]
# --- Similar-links (page level) --- print(f"embed.py: pages: {len(pages) - len(miss_idxs)} cached / "
print(f"embed.py: embedding {len(pages)} pages…") f"{len(miss_idxs)} to embed")
page_vecs = model.encode(
[p["text"] for p in pages], if miss_idxs:
normalize_embeddings=True, print(f"embed.py: loading {PAGE_MODEL_NAME}@{PAGE_MODEL_REVISION[:8]}")
show_progress_bar=True, page_model = SentenceTransformer(
batch_size=64, PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True,
).astype(np.float32) # code_revision pins the auto_map modeling repo; it must reach
# both AutoConfig and AutoModel.from_pretrained.
model_kwargs={"code_revision": PAGE_MODEL_CODE_REVISION},
config_kwargs={"code_revision": PAGE_MODEL_CODE_REVISION},
)
new_vecs = page_model.encode(
[page_inputs[i] for i in miss_idxs],
normalize_embeddings=True,
show_progress_bar=True,
batch_size=8,
).astype(np.float32)
for i, vec in zip(miss_idxs, new_vecs):
cache[hashes[i]] = vec
# Drop the model before loading MiniLM below; sentence-transformers
# holds the full weight tensor in RAM until GC runs.
del page_model
# Assemble page_vecs in the original pages[] order.
page_vecs = np.stack([cache[h] for h in hashes]).astype(np.float32)
# Prune the cache to only currently-present hashes so a deleted page
# doesn't keep its vector around forever. Then persist.
save_vec_cache(PAGE_CACHE, PAGE_MODEL_NAME, PAGE_MODEL_REVISION,
PAGE_DIM, {h: cache[h] for h in hashes})
index = faiss.IndexFlatIP(page_vecs.shape[1]) index = faiss.IndexFlatIP(page_vecs.shape[1])
index.add(page_vecs) index.add(page_vecs)
@ -245,18 +396,38 @@ def main() -> int:
atomic_write_text(SIMILAR_OUT, json.dumps(similar, ensure_ascii=False, indent=2)) atomic_write_text(SIMILAR_OUT, json.dumps(similar, ensure_ascii=False, indent=2))
print(f"embed.py: wrote {len(similar)} similar-links entries") print(f"embed.py: wrote {len(similar)} similar-links entries")
# --- Semantic index (paragraph level) --- # --- Semantic index (paragraph level, MiniLM, content-hash cached) ---
if not paragraphs: if not paragraphs:
print("embed.py: no paragraphs extracted — skipping semantic index") print("embed.py: no paragraphs extracted — skipping semantic index")
return 0 return 0
print(f"embed.py: embedding {len(paragraphs)} paragraphs…") pcache = load_vec_cache(PARA_CACHE, PARA_MODEL_NAME,
para_vecs = model.encode( PARA_MODEL_REVISION, PARA_DIM)
[p["text"] for p in paragraphs], para_inputs = [p["text"] for p in paragraphs]
normalize_embeddings=True, para_hashes = [content_hash(t) for t in para_inputs]
show_progress_bar=True, para_miss = [i for i, h in enumerate(para_hashes) if h not in pcache]
batch_size=64,
).astype(np.float32) print(f"embed.py: paragraphs: {len(paragraphs) - len(para_miss)} cached / "
f"{len(para_miss)} to embed")
if para_miss:
print(f"embed.py: loading {PARA_MODEL_NAME}@{PARA_MODEL_REVISION[:8]}")
para_model = SentenceTransformer(PARA_MODEL_NAME,
revision=PARA_MODEL_REVISION)
new_para_vecs = para_model.encode(
[para_inputs[i] for i in para_miss],
normalize_embeddings=True,
show_progress_bar=True,
batch_size=64,
).astype(np.float32)
for i, vec in zip(para_miss, new_para_vecs):
pcache[para_hashes[i]] = vec
del para_model
# Assemble in original paragraph order; prune + persist the cache.
para_vecs = np.stack([pcache[h] for h in para_hashes]).astype(np.float32)
save_vec_cache(PARA_CACHE, PARA_MODEL_NAME, PARA_MODEL_REVISION,
PARA_DIM, {h: pcache[h] for h in para_hashes})
atomic_write_bytes(SEMANTIC_BIN, para_vecs.tobytes()) atomic_write_bytes(SEMANTIC_BIN, para_vecs.tobytes())

View File

@ -31,6 +31,7 @@ images are logged and the rest of the walk continues.
from __future__ import annotations from __future__ import annotations
import os
import sys import sys
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
@ -62,13 +63,20 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None: def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp") # PID-unique temp (concurrent runs can't share it), removed on
with tmp.open("w", encoding="utf-8") as f: # failure. No fsync: sidecars are regenerated from the photo on the
# Preserve a stable key order (width before height) so a manual # next build, so a lost rename costs one re-extraction, not data.
# diff stays easy to read across regenerations. tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
ordered = {k: data[k] for k in ("width", "height") if k in data} try:
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True) with tmp.open("w", encoding="utf-8") as f:
tmp.replace(path) # Preserve a stable key order (width before height) so a manual
# diff stays easy to read across regenerations.
ordered = {k: data[k] for k in ("width", "height") if k in data}
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _read_dimensions(image: Path) -> dict[str, int]: def _read_dimensions(image: Path) -> dict[str, int]:

View File

@ -36,6 +36,7 @@ images are logged and the rest of the walk continues.
from __future__ import annotations from __future__ import annotations
import json import json
import os
import shutil import shutil
import subprocess import subprocess
import sys import sys
@ -133,6 +134,12 @@ def _read_exif_via_exiftool(image: Path) -> dict[str, Any]:
entry. Numeric values come through as numbers; text values as entry. Numeric values come through as numbers; text values as
strings. We accept missing keys silently. strings. We accept missing keys silently.
""" """
# exiftool does not reliably support `--` as an end-of-options
# marker, so make the path argument non-option-shaped instead: a
# relative path is prefixed with ./ so it can never start with `-`.
image_arg = str(image)
if not os.path.isabs(image_arg):
image_arg = os.path.join(os.curdir, image_arg)
result = subprocess.run( result = subprocess.run(
[ [
"exiftool", "exiftool",
@ -156,7 +163,7 @@ def _read_exif_via_exiftool(image: Path) -> dict[str, Any]:
"-ImageWidth", "-ImageWidth",
"-ImageHeight", "-ImageHeight",
"-n", # numeric output for shutter/aperture/GPS/dimensions "-n", # numeric output for shutter/aperture/GPS/dimensions
str(image), image_arg,
], ],
capture_output=True, capture_output=True,
text=True, text=True,
@ -374,12 +381,19 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None: def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp") # PID-unique temp (concurrent runs can't share it), removed on
with tmp.open("w", encoding="utf-8") as f: # failure. No fsync: sidecars are regenerated from the photo on the
# Preserve the SIDECAR_KEYS order so a manual diff is easy to read. # next build, so a lost rename costs one re-extraction, not data.
ordered = {k: data[k] for k in SIDECAR_KEYS if k in data} tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True) try:
tmp.replace(path) with tmp.open("w", encoding="utf-8") as f:
# Preserve the SIDECAR_KEYS order so a manual diff is easy to read.
ordered = {k: data[k] for k in SIDECAR_KEYS if k in data}
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _read_one(image: Path) -> dict[str, Any]: def _read_one(image: Path) -> dict[str, Any]:

View File

@ -23,6 +23,7 @@ a palette extraction error.
from __future__ import annotations from __future__ import annotations
import os
import sys import sys
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
@ -62,10 +63,17 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None: def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp") # PID-unique temp (concurrent runs can't share it), removed on
with tmp.open("w", encoding="utf-8") as f: # failure. No fsync: sidecars are regenerated from the photo on the
yaml.safe_dump(data, f, sort_keys=False, allow_unicode=True) # next build, so a lost rename costs one re-extraction, not data.
tmp.replace(path) tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
yaml.safe_dump(data, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _extract_palette(image: Path) -> list[str]: def _extract_palette(image: Path) -> list[str]:

View File

@ -20,9 +20,11 @@
set -u set -u
# Newly-added .md files under content/essays/ in this commit. # Newly-added .md files under content/essays/ in this commit.
# `--name-status` output is TAB-separated (status<TAB>path); split on the
# tab so paths containing spaces survive intact.
mapfile -t added < <( mapfile -t added < <(
git diff --cached --name-status --diff-filter=A -- 'content/essays/*.md' \ git diff --cached --name-status --diff-filter=A -- 'content/essays/*.md' \
| awk '{ print $2 }' | cut -f2-
) )
if [[ ${#added[@]} -eq 0 ]]; then if [[ ${#added[@]} -eq 0 ]]; then
@ -47,8 +49,10 @@ for path in "${added[@]}"; do
# Best-effort frontmatter probe: does any line in the YAML head # Best-effort frontmatter probe: does any line in the YAML head
# block start with `status:`? Avoids a YAML dependency in the # block start with `status:`? Avoids a YAML dependency in the
# hook, which has to run before the build environment is sourced. # hook, which has to run before the build environment is sourced.
if awk '/^---$/{f++; next} f==1 && /^status:[[:space:]]*[^[:space:]]/{print; exit}' \ # Probe the STAGED blob (`git show :path`), not the working tree —
-- "$path" \ # the commit contains the index content, which may differ.
if git show ":$path" 2>/dev/null \
| awk '/^---$/{f++; next} f==1 && /^status:[[:space:]]*[^[:space:]]/{print; exit}' \
| grep -q .; then | grep -q .; then
has_status=1 has_status=1
fi fi

View File

@ -148,7 +148,14 @@ fi
echo "import-photo: stripping EXIF from delivered file..." echo "import-photo: stripping EXIF from delivered file..."
magick mogrify -strip "$TARGET" \ magick mogrify -strip "$TARGET" \
|| { echo "import-photo: magick mogrify -strip failed for $TARGET (EXIF NOT stripped)" >&2; exit 1; } || {
# The copy under content/ still carries full EXIF (GPS, serial
# numbers); the Makefile's `git add content/` could auto-commit
# and publish it. Remove it before bailing out.
rm -f -- "$TARGET"
echo "import-photo: magick mogrify -strip failed for $TARGET (EXIF NOT stripped); deleted the copied target so the EXIF-laden JPEG cannot be auto-committed" >&2
exit 1
}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Step 4: extract palette (does its own walk; idempotent on already-done photos) # Step 4: extract palette (does its own walk; idempotent on already-done photos)

View File

@ -28,7 +28,9 @@ echo -n "Signing subkey passphrase: "
read -rs PASSPHRASE read -rs PASSPHRASE
echo echo
echo -n "$PASSPHRASE" | GNUPGHOME="$GNUPGHOME" "$GPG_PRESET" --homedir "$GNUPGHOME" --preset "$KEYGRIP" # printf, not `echo -n`: a passphrase starting with -e/-n/-E would be
# eaten as an echo option.
printf '%s' "$PASSPHRASE" | GNUPGHOME="$GNUPGHOME" "$GPG_PRESET" --homedir "$GNUPGHOME" --preset "$KEYGRIP"
echo "Passphrase cached for keygrip $KEYGRIP (24 h TTL)." echo "Passphrase cached for keygrip $KEYGRIP (24 h TTL)."
echo "Test: GNUPGHOME=$GNUPGHOME gpg --homedir $GNUPGHOME --batch --detach-sign --armor --output /dev/null /dev/null" echo "Test: GNUPGHOME=$GNUPGHOME gpg --homedir $GNUPGHOME --batch --detach-sign --armor --output /dev/null /dev/null"

View File

@ -8,11 +8,29 @@ FREEZE="$REPO_ROOT/cabal.project.freeze"
cd "$REPO_ROOT" cd "$REPO_ROOT"
# Back up the current freeze and restore it if resolution fails, so an
# unsolvable index never leaves the repo with no freeze file at all
# (recoverable via git, but the script shouldn't depend on that).
BACKUP=""
if [ -f "$FREEZE" ]; then
BACKUP="$(mktemp "$FREEZE.bak.XXXXXX")"
cp "$FREEZE" "$BACKUP"
fi
restore_on_failure() {
if [ -n "$BACKUP" ]; then
echo "==> Refreeze failed — restoring previous freeze file." >&2
mv "$BACKUP" "$FREEZE"
fi
}
trap restore_on_failure ERR
echo "==> Removing stale freeze file..." echo "==> Removing stale freeze file..."
rm -f "$FREEZE" rm -f "$FREEZE"
echo "==> Resolving dependencies and writing new freeze file..." echo "==> Resolving dependencies and writing new freeze file..."
cabal freeze cabal freeze
trap - ERR
[ -n "$BACKUP" ] && rm -f "$BACKUP"
echo "==> Verifying build..." echo "==> Verifying build..."
cabal build cabal build

View File

@ -49,8 +49,19 @@ def stamp_file(path: str, replacement_bytes: bytes) -> bool:
data, data,
) )
if count and new_data != data: if count and new_data != data:
with open(path, "wb") as f: # Write to a sibling temp file and os.replace so an interrupt
f.write(new_data) # mid-write never leaves a truncated deployed HTML file.
tmp = path + ".stamp-tmp"
try:
with open(tmp, "wb") as f:
f.write(new_data)
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except FileNotFoundError:
pass
raise
return True return True
return False return False

11
uv.lock
View File

@ -156,6 +156,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" }, { url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" },
] ]
[[package]]
name = "einops"
version = "0.8.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/2c/77/850bef8d72ffb9219f0b1aac23fbc1bf7d038ee6ea666f331fa273031aa2/einops-0.8.2.tar.gz", hash = "sha256:609da665570e5e265e27283aab09e7f279ade90c4f01bcfca111f3d3e13f2827", size = 56261, upload-time = "2026-01-26T04:13:17.638Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2a/09/f8d8f8f31e4483c10a906437b4ce31bdf3d6d417b73fe33f1a8b59e34228/einops-0.8.2-py3-none-any.whl", hash = "sha256:54058201ac7087911181bfec4af6091bb59380360f069276601256a76af08193", size = 65638, upload-time = "2026-01-26T04:13:18.546Z" },
]
[[package]] [[package]]
name = "faiss-cpu" name = "faiss-cpu"
version = "1.13.2" version = "1.13.2"
@ -364,6 +373,7 @@ dependencies = [
{ name = "altair" }, { name = "altair" },
{ name = "beautifulsoup4" }, { name = "beautifulsoup4" },
{ name = "colorthief" }, { name = "colorthief" },
{ name = "einops" },
{ name = "faiss-cpu" }, { name = "faiss-cpu" },
{ name = "matplotlib" }, { name = "matplotlib" },
{ name = "numpy" }, { name = "numpy" },
@ -379,6 +389,7 @@ requires-dist = [
{ name = "altair", specifier = ">=5.4,<6" }, { name = "altair", specifier = ">=5.4,<6" },
{ name = "beautifulsoup4", specifier = ">=4.12,<5" }, { name = "beautifulsoup4", specifier = ">=4.12,<5" },
{ name = "colorthief", specifier = ">=0.2,<1" }, { name = "colorthief", specifier = ">=0.2,<1" },
{ name = "einops", specifier = ">=0.8.2,<1" },
{ name = "faiss-cpu", specifier = ">=1.9,<2" }, { name = "faiss-cpu", specifier = ">=1.9,<2" },
{ name = "matplotlib", specifier = ">=3.9,<4" }, { name = "matplotlib", specifier = ">=3.9,<4" },
{ name = "numpy", specifier = ">=2.0,<3" }, { name = "numpy", specifier = ">=2.0,<3" },