Compare commits

..

23 Commits

Author SHA1 Message Date
Levi Neuwirth 5d344f940e Last audit stragglers: scaffolder, refreeze safety, atomic-write polish
- add-popup-source.sh: slug validated against ^[a-z0-9-]+$ before nginx
  interpolation; UPSTREAM_HOST derived unconditionally so the CSP
  reminder fires in the no-proxy case — which is exactly when the host
  must be added to connect-src (AUDIT §4.8)
- refreeze.sh: backs up the freeze and restores it on a failed resolve
  instead of leaving the repo with no freeze file (§4.9)
- einops gets the policy-mandated upper bound and a comment naming its
  consumer (nomic's remote modeling code) (§1.5)
- Makefile: pdftoppm failures warn instead of vanishing in the while
  pipeline; .NOTPARALLEL guards deploy's clean->build->sign ordering
  against -j invocations (§8.4)
- Atomic writers (embed, archive, the three sidecar extractors):
  PID-unique temp names so concurrent runs can't interleave, cleanup on
  failure everywhere, fsync where the artifact is not trivially
  regenerable (§4.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:43:14 -04:00
Levi Neuwirth 23bc2d0dc1 Frontend tail: keyboard access, idempotence, input edge cases
- gallery.js: math/score focus overlays are keyboard-activatable
  (role=button, tabindex, Enter/Space) and focus return on close lands
  on a focusable trigger (AUDIT §5.7)
- annotations.js: marks are focusable; Enter/Space pins the tooltip
  with focus moved to its Delete button, Escape dismisses — the delete
  affordance is finally reachable without a mouse (§5.7)
- transclude.js: nested transclusions resolve (depth-capped at 3, with
  ancestor-chain cycle rejection rendering the existing error style);
  collapse.js reinit is idempotent via data-collapse-bound (§5.7)
- copy.js excludes the button label from code-less <pre> copies;
  score-reader.js stops rewriting plain loads to ?p=1; search-filters
  treats non-numeric threshold input as inactive instead of a
  match-everything >=0 filter; selection-popup no longer re-summons
  the toolbar while typing capitals in the annotation picker (§5.8)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:25:19 -04:00
Levi Neuwirth 9f61ce5949 Tooling, manifest, and content polish
- import-photo.sh deletes the copied JPEG when EXIF stripping fails, so
  the auto-commit can never publish GPS/serial metadata (AUDIT §4.11)
- pre-commit-marks hook: tab-aware path parsing, probes the staged blob
  rather than the working tree (§4.11)
- preset-signing-passphrase uses printf; stamp-build-time writes via
  temp + os.replace; archive.py passes -- to pdftotext and verifies the
  vendored monolith binary against its recorded sha256 (mismatch is
  fatal, consistent with the tool's integrity contract); extract-exif
  ./-prefixes relative paths (§4.11)
- blog-post.html: id="similar-links"/"backlinks" each appear once;
  rendered output unchanged (§6.4)
- site.webmanifest: start_url/scope/description added, maskable icon
  purpose restored alongside any (§9.3)
- Frontmatter cleanup: scaffold comments out of scaling_outage,
  dangling null confidence-history keys removed (populated ones kept),
  dead modified: key dropped from colophon (§6.4)
- canto31.jpg: 4.0 MB -> 1.9 MB (2400px, q80, grayscale — the source
  is a monochrome Doré engraving, so single-channel is colorimetrically
  lossless); webp sidecar regenerated (§6.4, prior-audit §6.1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:34 -04:00
Levi Neuwirth 56afdb867a Feature modules: URL normalization, Maybe-trust, proper medians
- Empty/all-comments manifest.yaml is the empty archive, not a fatal
  parse error (AUDIT §3.11)
- Backlinks normaliseUrl strips index.html like SimilarLinks, so links
  to canonical directory URLs invert again; Stats normUrl updated in
  lockstep (§3.12)
- PDF viewer file= query value percent-encoded (hand-rolled RFC 3986
  encoder; network-uri is not a dependency) (§3.13)
- Photography feed thumbnails embed for flat singles and series
  children, not just directory entries (§3.14)
- Marks trust is Maybe Int: missing confidence/evidence collapses the
  figure to the bare frame as documented, instead of a literal
  "0 TRUST"; result-shape glyph centers when no score (§3.15)
- Unknown catalog categories fold into one Other bucket; medians take
  the mean of middle elements; protocol-relative URLs excluded from
  backlinks; @string/@comment/@preamble skipped in BibTeX parsing;
  watch-staleness of the once-per-process archive reads documented;
  stale comments fixed (§3.16, §3.9)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:34 -04:00
Levi Neuwirth f254ce866e Filters: fence/code-span awareness, host matching, nested-header skip
- SourceRefs trigger whitelist aligned to the /source/ serving
  whitelist (drops content/, yaml-source/, broad static//tools//data
  prefixes; adds .bib); existsCached no longer memoizes non-existence,
  so files created under make watch are picked up (§2.5, §2.16)
- fill/stroke hex replacement is boundary-aware: #000080 and 8-digit
  RGBA forms can no longer be corrupted into currentColor80 (§2.12)
- Wikilinks/Transclusion/EmbedPdf skip fenced code blocks (shared
  CommonMark fence tracker), and wikilinks additionally skip inline
  code spans — the syntax-documentation essay now renders its own
  examples literally while live wikilinks still convert (verified both
  ways in output) (§2.13)
- domainIcon matches the extracted host by label suffix instead of
  substring-of-URL; extractHost also strips userinfo (§2.14)
- webpSrc escaped in srcset; internal PDF links no longer double-
  classified; Smallcaps/Archive header-skip now holds at every nesting
  depth via protect/restore walks (§2.17)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:08 -04:00
Levi Neuwirth c8eeaaa9bc Core build cleanups: guards, pattern unification, noResult hygiene
- Library page no longer hard-depends on content/library.md; deleting
  it degrades to no intro block (AUDIT §2.8)
- primaryPortalOf accepts scalar comma-form tags via getTags, matching
  the tag system (§2.9)
- allContent gains me/ and memento-mori/ so their outgoing links join
  the backlinks graph; photography exclusion now documented (§2.10)
- Paginated tag pages partition AND sort by the same revision-aware
  display date — cross-page order is monotone again (§2.11)
- New stripPrefixRoute replaces gsubRoute at 17 call sites: prefix-only
  stripping, no mid-path mangling; route inventory verified identical
  (§2.15)
- random-pages uses canonical patterns (collection poems randomizable);
  pattern literals replaced with Patterns imports; duplicate local
  poetry patterns deleted; flat/collection poetry rules merged (§2.17)
- noResult instead of empty-list/fail for tagLinksField, dotsField,
  abstract/description/summary/bibliography/further-reading, plus the
  confidence-trend, overall-score, has-score, has-movements, and
  movement-audio fields — no more empty wrappers or [ERROR] log noise
  for legitimately-absent values (§2.17)
- tagItemCtx composes siteCtx, so monograms render on tag pages (§2.17)
- readingTime ceilings (399 words -> 2 min); authorSlugify comment
  fixed to match behavior, code untouched for URL stability; stale
  portal-count comments corrected (§2.17)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:08 -04:00
Levi Neuwirth 945086421a embed.py: hash-cache the paragraph pass; drop the dead mtime skip
The 'skip if outputs newer than every HTML' check could never fire:
stamp-build-time.py rewrites every page's footer AFTER embed.py runs,
so the comparison was always false and the full MiniLM paragraph pass
(and model load) ran on every build (AUDIT §4.3). Replaced with the
same content-hash cache the page pass already had — generalized
load/save_vec_cache, keyed by sha256 of the input text, invalidated on
model/revision/dim change. A no-change rerun now does no model loads:
measured 97s cold -> 4.8s warm.

Also strips section.footnotes from extraction: the new no-JS fallback
duplicates each sidenote's text at document end, which would double
footnotes in search results and skew page similarity.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:51:01 -04:00
Levi Neuwirth b2951c0c2c Branding diet: logo sprite via <use>, lean favicon.ico, simple mask icon
- The ~33 KB traced logo moves from an inlined-per-page partial to
  /logo-sprite.svg referenced with <use> — cached once instead of
  shipped on every page (homepage HTML: 46 KB -> 13 KB). CSS custom
  properties cascade into the use shadow tree, so the two-tone cutout
  is unchanged (AUDIT §9.1)
- favicon.ico regenerated at 16/32/48 from the 512px master: 71 KB ->
  15 KB; modern browsers take the SVG anyway, the .ico is the legacy
  fallback (§9.2)
- link-icons/internal.svg restored to the simple 4 KB path: it renders
  at 0.7-1.6 rem through a CSS mask, where the 33 KB traced detail
  cannot resolve (§9.2)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:43:06 -04:00
Levi Neuwirth aeb2937f7c Drafts are local-only: untrack the four committed ones
.gitignore has declared content/drafts/ local-only working notes since
the rule was added, but four drafts were already tracked — ignore rules
don't untrack, so make build's auto-commit kept staging and deploy kept
pushing them (AUDIT §6.3). Untracked with --cached; the files remain on
disk and still build in dev. Also moved inclusionist-manifesto.md into
drafts/essays/ where the draft rule actually matches it (§6.1), and
un-shadowed the tracked .env.example from the credential patterns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:40:05 -04:00
Levi Neuwirth 8ca22a45d2 Sidenotes: emit the section.footnotes fallback the CSS expects
The filter consumes every Pandoc Note, so the "standard Pandoc-
generated section.footnotes" its doc claimed as the no-JS fallback
never existed — below 1500px with JS disabled, footnote content was
simply invisible (AUDIT §2.3). The filter now collects consumed notes
and appends the section itself: letter labels, jump targets for the
in-text refs (which now point at the visible fallback item), and
doc-backlink returns. sidenotes.js pairs ref/note by element id and
preventDefaults clicks, so behavior with JS is unchanged.

Verified in output: per-page item count matches inline sidenote count;
refs target #fn-<label>; backlinks target #snref-<label>.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:37:28 -04:00
Levi Neuwirth 4e28c82e4c Fix SIMD essay repository URL: add missing owner segment
https://git.levineuwirth.org/where-simd-helps returned 404; the
owner-qualified form returns 200 (AUDIT §6.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:44:26 -04:00
Levi Neuwirth 8040be1aee Docs: align WRITING.md and README with the implementation
- js: page-script paths are site-root-relative, not content-relative
  (AUDIT §7.1)
- directory-form standalone pages need a dedicated Site.hs rule; flat
  content/<page>.md is the generic form (§7.2)
- portal table: add the missing Photography row (§7.3)
- document the implemented-but-undocumented summary:, revised:, and
  keywords: fields, including a Revision dates section (§7.4)
- default citation style is Chicago Notes Bibliography, not
  Author-Date; hover previews come from popups.js, not the deleted
  citations.js (§7.5)
- history: entries may be authored in any order (sorted at build
  time); examples reordered newest-first (§3.5)
- README: make watch runs Hakyll's live-reload preview server (§7.5)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth caa113e036 Frontend: search races, lightbox a11y, popup edge cases
- semantic-search.js: generation token prevents stale results from
  rendering over newer queries; in-flight dedup on the index fetch;
  index/meta size consistency check fails loudly instead of NaN
  ranking (AUDIT §5.5)
- lightbox.js: triggers keyboard-activatable (role=button, tabindex,
  Enter/Space); Tab trapped inside the aria-modal overlay, modeled on
  gallery.js (§5.6)
- nav.js: portal toggle persists via guarded safeStorage so
  storage-blocked contexts can't kill the toggle (§5.7)
- popups.js: provider url() throws (malformed percent-encoding) are
  treated as no-popup; future dates render nothing instead of
  "N days ago" (§5.7)
- search.js: missing PagefindUI degrades to a console warning instead
  of aborting the whole handler (§5.7)
- citations.js: deleted — dead code superseded by popups.js (§5.7)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth c17c203747 Tooling robustness: atomic writes, verified downloads
- archive.py: PROVENANCE.json / archive-index.json / archive-state.json
  now written atomically (tmp + os.replace) — a truncated integrity
  record is the one thing this tool must never produce (AUDIT §4.4);
  manifest entries validated as mappings up front (§4.7); refresh
  rejects provenance with a missing/empty artifact key instead of
  crashing on IsADirectoryError (§4.7); wayback save URL quotes
  unsafe characters (§4.7)
- download-leaflet.sh: existing files are re-verified before being
  skipped, and downloads land in a .part temp moved into place only
  after checksum verification — a failed verification can no longer
  leave a bad file that the next run silently accepts (§4.5)
- download-model.sh, convert-images.sh: same temp-then-move pattern so
  interrupted downloads/conversions never persist at final paths (§4.6)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth c68d03af31 Fix audit MEDs in feature modules
- Backlinks: handle Plain blocks (tight list items) and DefinitionList
  in link extraction — links in ordinary bullet lists were invisible to
  the backlinks system (AUDIT §3.3)
- Sidenotes: render note bodies with a KaTeX writer so footnote math
  reaches the client-side KaTeX pass instead of degrading to italics
  (§2.4)
- Archive: join manifest to provenance on normalised URLs like every
  other comparison in the system — an equivalent-form URL edit silently
  unpublished the page while links kept pointing at it (§3.6)
- Photography: flat singles get their basename as slug and root-level
  asset paths in map.json (§3.7); geo-precision now fails closed — an
  unrecognised value (typo'd "hidden") suppresses the pin instead of
  publishing rounded coordinates (§3.8)
- Stability: age is measured first-commit -> today, not the commit
  span, so quiet time stabilises a piece as documented (§3.4);
  history: entries are sorted newest-first by date regardless of
  authored order (§3.5); pinned pages format last-reviewed like the
  git branch (§3.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth 902e43ea19 Add /poetry/ and /fiction/ indexes; widen tag-collision guard
Nav, the home portal grid, and the library have linked both URLs since
the portals were added, but no rule generated either index — confirmed
404s in production (AUDIT §2.1). Both rules mirror the essays index;
fiction renders an empty list until content exists.

sectionOwnedTopLevelTags now lists every namespace owning a
<name>/index.html route, not just photography — Hakyll silently
overwrites on duplicate routes, so an essay tagged e.g. "music" would
have clobbered a real section landing (AUDIT §2.2).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:25:50 -04:00
Levi Neuwirth f11495ff9a Fix audit tooling/infra findings
- embed.py: pin nomic's auto_map modeling repo via code_revision —
  revision= alone left nomic-bert-2048 unpinned under
  trust_remote_code (AUDIT §1.3; verified loadable with
  HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
  cache so a half-written npz is discarded, not fatal (§4.2), and
  unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
  continuation in quoted strings, so the old value embedded literal
  backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
  site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
  and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
  unrelated work is no longer swept into auto: commits (§8.3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth c64f3d63c0 Fix audit frontend MEDs
- score-reader template: load utils.js before theme.js — without
  lnUtils.safeStorage the saved theme/text-size never restored on
  score pages (AUDIT §5.1)
- search-filters: expand trailing-slash pathnames to .../index.html
  before the epistemicMeta lookup; clean-URL pages were silently
  bypassing every active filter (AUDIT §5.2)
- viz: treat cappuccino as a dark theme so charts stop rendering
  near-black marks on a dark brown background (AUDIT §5.3)
- collapse: namespace section-collapsed keys by pathname (Pandoc
  auto-slugs recur across essays) and go through safeStorage like the
  rest of the site (AUDIT §5.4)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth 7ca937d98c Fix audit HIGHs/MEDs in build code
- ArchiveIndex: guard rawIndex/rawState with doesFileExist so a fresh
  clone (gitignored data/ JSONs absent) degrades to empty instead of
  crashing — the behavior the module doc already promised (AUDIT §1.2)
- Commonplace: decode YAML via encodeUtf8, not Char8.pack, which
  truncates codepoints above 0x7F (AUDIT §3.2)
- Stats: DayOfWeek is ISO-numbered (Mon=1..Sun=7); dowOf and weekStart
  assumed Mon=0..Sun=6, clipping every Sunday cell outside the heatmap
  viewBox and starting weeks on Sunday (AUDIT §3.1)
- Site: epistemicEntry now honors the proved/proven confidence sentinel
  like Contexts.overallScoreField (AUDIT §2.6)
- Contexts: affiliationField returns noResult instead of an empty list,
  so essays without affiliation no longer render an empty meta row
  (AUDIT §2.7)

Verified: full site build passes; proved page gets score=100 in
epistemic-meta.json; empty .meta-affiliation gone; heatmap rows
y=22..94 all inside the 104-high viewBox.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:30 -04:00
Levi Neuwirth 70ad44e9f4 Add 2026-06-09 repository audit findings
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 7c5354efa7 embed.py: split page vs paragraph embedding models
Pages (similar-links.json, build-only) move to nomic-embed-text-v1.5
(768d) with an on-disk npz cache; paragraphs (browser semantic search)
stay on all-MiniLM-L6-v2 (384d), so the client contract is unchanged.
WRITING.md search row updated accordingly. einops added for nomic's
remote modeling code; cache gitignored with a trailing glob so
interrupted-write debris is covered too.

Known follow-ups (AUDIT-2026-06-09.md §1.3, §4): pin the
nomic-bert-2048 remote code, catch BadZipFile in cache loads, fix the
staleness check defeated by stamp-build-time ordering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 37665f67db Branding: traced logo mark, regenerated favicons, og-image
New inline logo-mark.svg partial in the nav (two-tone cutout via
--logo-ink/--logo-bg), regenerated favicon set + web-app manifest icons
from the new mark, 1200x630 og-image wired into head.html.

Known follow-ups (AUDIT-2026-06-09.md §9): the traced SVG is ~33 KB
inlined per page, favicon.ico carries 128/256px entries, and the
webmanifest dropped its maskable purpose.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:34 -04:00
Levi Neuwirth a7b3b9cd07 Refreeze after system update: distributive 0.6.3 et al.
The pinned distributive 0.6.2.1 conflicted with the pacman package db
(comonad-5.0.10 built against 0.6.3), making a fresh solve impossible —
same failure mode as the 2026-05-07 audit's aeson pin. Regenerated via
tools/refreeze.sh; cabal build --dry-run now resolves.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:25 -04:00
93 changed files with 2919 additions and 1055 deletions

6
.gitignore vendored
View File

@ -10,6 +10,9 @@ _cache/
**/.env
**/.env.*
**/*.env
# .env.example is documentation (tracked), not a credential file — the
# patterns above would otherwise shadow it for status/add purposes.
!.env.example
**/*.key
**/*.pem
**/*.p12
@ -73,6 +76,9 @@ data/build-stamp.txt
data/last-build-seconds.txt
data/semantic-index.bin
data/semantic-meta.json
# Both embed caches (pages + paragraphs); the trailing glob also
# catches interrupted-write debris (.tmp / .tmp.npz)
data/embed-cache-*
# Archive: generated text + its staleness stamp (recreated from the
# committed artifact on every build — deterministic, so committing them is

931
AUDIT-2026-06-09.md Normal file
View File

@ -0,0 +1,931 @@
---
title: Repository audit
date: 2026-06-09
---
# Repository audit — levineuwirth.org (2026-06-09)
Comprehensive audit of the repo on `main` at commit `620b974` (working tree
modified: branding refresh across `static/` + `templates/partials/`, plus
`tools/embed.py` rework; untracked `static/og-image.png`,
`templates/partials/logo-mark.svg`, `data/embed-cache-pages.npz.tmp.npz`).
Severity legend: **HIGH** (likely to break a build, cause data loss, or
expose a security weakness) — **MED** (latent bug, brittleness, or
documentation drift) — **LOW** (minor robustness gap or fragile assumption) —
**NIT** (style, polish, or paranoia).
Numbers are file:line against the working tree at audit time. Findings
marked "verified" were reproduced empirically (solver runs, built `_site/`
output inspection, live HTTP checks, binary parsing); the rest were
confirmed by reading the code.
Prior audit: `AUDIT.md` (2026-05-07). Follow-up status in §10.
---
## 1. Build & dependency chain
### 1.1 `cabal.project.freeze` is unsolvable again — next clean build fails — **HIGH**
`cabal build --dry-run` fails today (verified): the freeze pins
`distributive ==0.6.2.1`, but the system (pacman) GHC package db has
`comonad-5.0.10` built against `distributive-0.6.3`:
```
rejecting: distributive-0.6.3/installed... (constraint from
cabal.project.freeze requires ==0.6.2.1)
After searching the rest of the dependency tree exhaustively...
```
The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This
is the same failure mode as prior-audit §1.1 — that audit's specific aeson
pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke
the same way after a system update. Recent builds succeed only off the
cached `dist-newstyle/cache/plan.json`; the freeze file has since changed,
so the next cabal invocation re-solves and fails. Because `make deploy`
starts with `make clean`, the next deploy hits this. `levineuwirth.cabal`'s
own bounds are compatible with the freeze — the conflict is
freeze-vs-installed-db, not freeze-vs-cabal-file.
Fix: `tools/refreeze.sh` (written for exactly this post-`pacman -Syu`
situation). The underlying fragility — freezing against a mutable system
package db — remains; consider documenting the refreeze step as part of any
system-upgrade ritual. *(In progress at time of writing.)*
### 1.2 Missing `data/archive-index.json` / `archive-state.json` crashes the build — **HIGH**
`build/ArchiveIndex.hs:134-146`. The module doc (lines 18-22) promises "An
absent or malformed file degrades safely: an empty index makes the link
consumers no-op; an absent state file makes every entry @Live@." But
`rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath`
(and identically `rawState`) never checks `doesFileExist`, and aeson's
`eitherDecodeFileStrict'` throws an uncaught `IOException` on a missing
file (verified: `withBinaryFile: does not exist`). Both files are
gitignored (`.gitignore:84-85`), so a fresh clone or a no-`.venv` build —
the exact path `build/Archive.hs:20-24` promises to support — throws when
the CAF is first forced. Contrast `readUrlSet` (line 109) in the same file,
which guards correctly. Currently latent on this machine only because both
generated files happen to exist.
### 1.3 `embed.py` `trust_remote_code=True` executes unpinned third-party code — **HIGH**
`tools/embed.py:329` (line ~341 in the uncommitted version). The new
page-model load is
`SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True)`.
The `revision` arg pins only the `nomic-ai/nomic-embed-text-v1.5` repo; the
actual modeling code is pulled via `auto_map` from a *different* repo —
verified in the local HF cache: the executed code lives under
`transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/...`,
i.e. `nomic-ai/nomic-bert-2048` at its current head, which nothing pins. A
compromise of that second repo runs arbitrary Python at build time, in a
repo whose every other download path (download-model.sh, pdfjs, leaflet) is
sha256-pinned. The comment "Both pins are deliberate" is therefore
misleading. Fix: pin via `code_revision`, or run with `HF_HUB_OFFLINE=1`
after first fetch, or document the accepted risk.
### 1.4 Working-tree commit hazard: tracked templates reference untracked files — **HIGH (process)**
`templates/partials/nav.html:5` (tracked, modified) adds
`$partial("templates/partials/logo-mark.svg")$` and
`templates/partials/head.html` references `/og-image.png` — both target
files are **untracked** (no git history). Committing the template diff
without `git add`-ing both breaks every page's Hakyll build on a fresh
clone (`$partial$` aborts compilation) and 404s the og:image. They must
land in the same commit. Conversely, `data/embed-cache-pages.npz.tmp.npz`
must **not** be committed (see §4.1). The partial itself is safe as a
Hakyll template (verified: zero `$` characters; `match "templates/**"`
compiles it).
### 1.5 `einops` dependency: undocumented, unbounded, imported nowhere — **LOW**
`pyproject.toml:27` adds `einops>=0.8.2`. No import anywhere in
`tools/`/`build/`/`static/js/`; its only consumer is nomic's
`trust_remote_code` module (§1.3). Every sibling dependency has an
explanatory comment and an upper bound per the file's own stated policy
("Upper bounds are intentionally generous (next major) but always
present"); einops has neither. `uv lock --check` passes (0.8.2 pinned).
---
## 2. Haskell build code — core
### 2.1 Nav, home grid, and library link `/fiction/` and `/poetry/` — confirmed 404s — **MED**
`build/Site.hs:50-60` (`homePortals` contains `("Fiction","fiction")`,
`("Poetry","poetry")`), `templates/partials/nav.html:56,61`,
`templates/library.html:44,58`. No rule generates either index: fiction and
poetry are not in `tagIndexable` (`build/Patterns.hs:148-151` = essays +
blog + photos) and Site.hs has no landing rule. Verified: `_site/fiction`
does not exist; `_site/poetry/` has no `index.html`. nginx has no
redirects. Both links 404 in production today.
### 2.2 Tag/route collisions guarded for `photography` only — **MED**
`build/Tags.hs:98-99`. `tagIdentifier` maps tag `t``t ++ "/index.html"`;
`sectionOwnedTopLevelTags = ["photography"]` is the only guard. A
tagIndexable item tagged `music` (or `music/x`, which expands to `music`)
emits `music/index.html`, already owned by the music index route
(`build/Site.hs:486-487`); similarly `essays`, `blog`, `cv`, `archive`,
`authors`, `bibliography`. Hakyll does not error on duplicate routes — one
silently overwrites the other.
### 2.3 Sidenotes filter destroys the documented no-JS fallback — **MED**
`build/Filters/Sidenotes.hs:30-36` vs `static/css/sidenotes.css:125-135`.
The module doc claims the Pandoc `<section class="footnotes">` "serves as
fallback," but `apply` replaces every `Note`, so the writer never emits the
section. CSS depends on it below 1500px. Verified in output:
`_site/essays/scaling_outage.html` has 3 `class="sidenote"` and zero
`footnotes` occurrences. With JS disabled, footnote content is invisible on
narrow viewports. The comment, the CSS, and ozymandias.md's own prose all
contradict actual behavior.
### 2.4 Sidenote bodies rendered without the KaTeX writer — **MED**
`build/Filters/Sidenotes.hs:103-115`. `inlinesToHtml`/`blocksToHtml` use
`writeHtml5String (def :: WriterOptions)` (PlainMath), while the main
pipeline uses `KaTeX ""` (`build/Compilers.hs:47`). Math inside a footnote
never gets `<span class="math inline">\(...\)</span>`, so KaTeX never
renders it — degrades to plain italics, silently inconsistent with body
math.
### 2.5 SourceRefs whitelist vs `/source/` serving whitelist have drifted — **MED**
`build/Filters/SourceRefs.hs:114-141` vs `build/Site.hs:217-240`. Site.hs:209
says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps
`content/` and `yaml-source/` (no Site counterpart); `static/` + any known
ext vs Site's `static/js/**`/`static/css/**` only; `tools/` + any ext vs
Site's `tools/**.sh`/`tools/**.py`; `data/` at any depth vs Site's
top-level `data/*.{json,yaml,md,bib}`. Each mismatch yields a wrapped
source-ref whose popup fetch 404s (Forgejo href fallback still works).
Inverse: Site serves `data/*.bib` but `.bib` is missing from
`hasKnownExt` — dead whitelist entry.
### 2.6 `epistemicEntry` ignores `confidence: proved` — **MED**
`build/Site.hs:1014-1024`. Comment: "Compute overall-score the same way
Contexts.overallScoreField does," but it uses
`readMaybe =<< lookupString "confidence" meta`, which is `Nothing` for
`"proved"`/`"proven"`, whereas `Contexts.overallScoreField`
(`build/Contexts.hs:574-576`) substitutes 100 via `isProvedConfidence`.
Proved pages get no `score` in `data/epistemic-meta.json` and export the
raw string under `confidence`, so client-side filtering silently misses
them.
### 2.7 Empty affiliation `<div>` ships on every essay without `affiliation:` — **MED**
`build/Contexts.hs:84-89` + `templates/partials/metadata-tail.html:12`.
`affiliationField` returns an empty list instead of `noResult`; Hakyll's
`$if$` is truthy for empty list fields (the codebase knows this —
`tagLinksFieldExcludingScope` uses `noResult` for exactly this reason).
Verified in output: `_site/essays/asymmetric-forgetting.html` contains
`<div class="meta-row meta-affiliation">` with whitespace-only content.
### 2.8 Library page hard-depends on `content/library.md` — **LOW**
`build/Site.hs:675`. `_ <- loadSnapshot libraryIntroId "body"` is a
top-level compiler statement (not inside a `field`), so it's a hard
failure. The block is documented as "optional prose block"; deleting
`content/library.md` breaks the whole `library.html` compile. Contrast the
existence-guarded sidecars at `build/Tags.hs:277-283` and
`build/Site.hs:843-850`.
### 2.9 Library `primaryPortalOf` reads only list-form `tags:` — **LOW**
`build/Site.hs:632-638`. `lookupStringList "tags"` returns `Nothing` for
scalar comma form (`tags: research, ai`), which Hakyll's `getTags`
accepts. Such an item appears on tag pages but is silently dropped from
the library. All current content uses list form — latent.
### 2.10 `allContent` omits me/, memento-mori/, photography from the link graph — **LOW**
`build/Patterns.hs:124-133`, used by `build/Backlinks.hs:334,345`. Despite
"Every content file the backlinks pass should index," `content/me/index.md`
and `content/memento-mori/index.md` (full essays, rendered with
`backlinksField`) never have their outgoing links extracted; photography
likewise. Either deliberate-but-undocumented or the exact silent omission
the module header says it exists to prevent.
### 2.11 Paginated tag pages: split by creation date, sorted by display date — **LOW**
`build/Tags.hs:371-377`. `buildPaginateWith (sortAndGroupAt tagPageSize)`
partitions via `sortRecentFirst` (creation date), then each page re-sorts
with `recentFirstByDisplay` (revision-aware). A recently revised old item
stays on a late page but jumps to its top — cross-page ordering is not
monotone. Only fires above the 150-item threshold.
### 2.12 `fill:#000` replacement corrupts longer hex colors — **LOW**
`build/Filters/Score.hs:118-133` (and `Filters/Viz.hs` `processColors`).
The 6-digit pass protects only `#000000`; for `fill:#000080` the 3-digit
pass produces `fill:currentColor80` — invalid CSS, silently mangled SVG.
Quoted attribute forms are safe; only unquoted style-property forms are
exposed.
### 2.13 Source-level preprocessors rewrite inside fenced code blocks — **LOW**
`build/Filters/Wikilinks.hs:24-31`, `Filters/Transclusion.hs:18-20`,
`Filters/EmbedPdf.hs`. All run on the raw source before Pandoc parses
fences: `[[anything]]` in a code block becomes a link; a code-block line
that is exactly `{{slug}}` or `{{pdf:...}}` becomes raw HTML.
Transclusion's comment ("prevents accidental substitution inside prose or
code") is false for full-line directives in code blocks. A live foot-gun
for a site that documents its own syntax (ozymandias.md does exactly
this).
### 2.14 `domainIcon` matches substrings of the whole URL, not the host — **LOW**
`build/Filters/Links.hs:120-153`. `"x.com" `T.isInfixOf` url` etc. —
`https://example.org/why-x.com-failed` gets the Twitter icon. Contradicts
the strict-hostname discipline `isExternal` documents at lines 95-101 of
the same file. Cosmetic (icon only).
### 2.15 `gsubRoute "content/"` strips every occurrence, not just the prefix — **LOW**
`build/Site.hs:171,357,417` etc. Hakyll's `gsubRoute` is replace-all; a
co-located directory literally named `content` would be silently mangled
(`content/essays/slug/content/data.csv` → `essays/slug/data.csv`). Same
for `gsubRoute "static/"`. Improbable but silent.
### 2.16 `existsCached` memoizes non-existence for the process lifetime — **LOW**
`build/Filters/SourceRefs.hs:160-166`. Under `make watch`, a source file
created after first reference stays cached as absent until restart.
### 2.17 Core NITs
- `build/Site.hs:42-44`: comment says "eight portals"; the list has nine.
Echoed at Site.hs:606 ("the eight") vs line 657's "nine times".
- `build/Site.hs:866-877`: random-pages.json comment says "essays + blog
posts only" but the rule loads fiction and flat poetry too; uses
flat-only `content/poetry/*.md` while the epistemic rule uses
`allPoetry` — collection poems are epistemic-indexed but never
randomizable.
- `build/Utils.hs:64-73`: `authorSlugify` comment claims runs of spaces
collapse; code maps each space (`"A B"` → `"a--b"`). Consistent
everywhere, so links work; comment wrong.
- `build/Utils.hs:31-32`: `readingTime` truncates (`div 200`) — 399 words
reports "1 min"; comment implies ceiling semantics.
- `build/Pagination.hs:42` + `build/Site.hs:77-82`: hardcoded pattern
literals duplicate `Patterns.hs`, defeating that module's stated purpose
(Patterns.hs:6-10).
- `build/Contexts.hs:174-180`: plain `tagLinksField` returns an empty list
rather than `noResult``$if(item-tags)$` is true and templates emit
empty tag wrappers (author-index.html, item-card.html).
- `build/Tags.hs:296-304`: `tagItemCtx` composes `defaultContext`, not
`siteCtx`, so `$if(has-monogram)$` never fires on tag pages — monograms
render on new.html/library but silently never on tag indexes.
- `build/Contexts.hs:485-492`: `dotsField` comment says "15" but accepts
0 (`max 0 (min 5 n)`) — `importance: 0` renders five empty circles.
- `build/Contexts.hs:375-381`: `descriptionField` doc says `noResult`;
code uses `fail` — behaviorally fine under Hakyll 4.16 `$if$` (verified
against Hakyll 4.16.7.1 source) but logs `[ERROR]` debug noise per
abstract-less page. Same in `abstractField`, `summaryField`,
`bibliographyField`.
- `build/Filters/Images.hs:233-234`: `webpSrc` interpolated into `srcset`
unescaped while sibling `src` goes through `esc`.
- `build/Filters/Links.hs:37-46,63-69`: internal PDF links double-classified
(`pdf-link` + `link-internal` chrome) despite the "no overlap" comment.
- `build/Filters/Smallcaps.hs:31-34` + `Filters/Archive.hs:42-44`:
"headers are skipped" only at top level; a Header nested in a
Div/BlockQuote is processed, contradicting the comments.
Verified clean: no unguarded `head`/`fromJust`/`read`/`!!` hazards in the
core modules; filter composition order matches its documenting comments;
Hakyll 4.16.7.1 `$if$` treats both `fail` and `noResult` as false.
---
## 3. Haskell build code — feature modules
### 3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — **MED**
`build/Stats.hs:185,300,317`. `dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6`
— but `time-1.12.2` is ISO-numbered (verified:
`map fromEnum [Monday..Sunday] == [1..7]`). So Sunday lands at y=106 while
`svgH` = 104 — every Sunday cell is clipped out of the viewBox and grid
row 0 is permanently blank. Relatedly, `weekStart` returns the previous
*Sunday* (and for a Sunday, 7 days back), not the "first Monday on or
before" its comment claims; builds run on a Sunday also clip the newest
column horizontally.
### 3.2 `Commonplace.hs` uses `Char8.pack` — non-ASCII YAML corruption — **MED**
`build/Commonplace.hs:143`. `Y.decodeEither' (BS.pack raw)` with
`Data.ByteString.Char8` truncates each `Char` to 8 bits — the exact hazard
`build/Now.hs:249-253` documents and fixes with `TE.encodeUtf8`.
`data/commonplace.yaml` is currently pure ASCII, so latent — but a
commonplace book of quotations is the likeliest file to acquire an em-dash
or curly quote, which will then either fail the YAML parse or publish
mojibake.
### 3.3 Backlinks: links inside tight lists are invisible — **MED**
`build/Backlinks.hs:220-226`. `extractLinksWithContext`'s `go` handles
`Para`, `BlockQuote`, `Div`, `BulletList`, `OrderedList`, then `go _ = []`.
Tight list items (the default `- item` form) are `Plain` blocks, not
`Para`, so recursion into list children yields nothing. Every internal
link written in a tight list never produces a backlink. `Header`, `Table`,
and `DefinitionList` blocks are likewise skipped. The doc comment implies
coverage it doesn't deliver.
### 3.4 Stability "age" is the first→last commit span, not time since first commit — **MED**
`build/Stability.hs:89-93,99-112`. Docs say "age in days since first
commit," but `classify (length dates) (daySpan (last dates) newest)`
computes the span between first and most recent *commit*, with no
reference to today. A piece written in a one-week burst years ago reports
"volatile" forever; time passing without commits can never increase
stability. Either the comment or the metric is wrong.
### 3.5 Frontmatter `history:` assumed newest-first; WRITING.md documents oldest-first — **MED**
`build/Stability.hs:204-217,299-336` vs `WRITING.md:105-109`.
`loadVersionHistory` keeps authored order and all range fields treat the
head as newest (`es@(newest:_) -> let oldest = last es`). Git history is
newest-first, but WRITING.md's `history:` example is oldest-first. With
the documented ordering, `version-history-range` renders reversed
("14 March 2026 1 March 2026"), `range-start` returns the newest date,
and `version-history-primary` shows the three *oldest* entries.
### 3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — **MED**
`build/Archive.hs:269`. `Map.lookup (meUrl me) provByUrl` joins on the raw
URL; everywhere else equivalence is `normalizeUrl` (ArchiveIndex
filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a
normalization-equivalent form (`http`→`https`, trailing slash, tracking
param) silently unpublishes `/archive/<slug>/` while ArchiveIndex's
normalized filter keeps the slug active — links keep pointing at a 404.
### 3.7 Photography `buildPin` computes wrong slug/thumb/title for flat entries — **MED**
`build/Photography.hs:354,362`. `slug = takeFileName (takeDirectory fp)`
for a flat `content/photography/foo.md` this yields `"photography"`, so
map.json gets `"slug": "photography"`, the title fallback is wrong, and
`thumb = "/photography/photography/<p>"` 404s (flat-single assets route to
`/photography/<asset>`). PHOTOGRAPHY.md:214 explicitly supports flat
singles. Latent — `content/photography/` currently has only `index.md`
but breaks the first geo-tagged flat single.
### 3.8 `geo-precision` fails open: a typo'd "hidden" publishes coordinates — **MED**
`build/Photography.hs:347-349,312-320`. Only the exact string matches
(`(_, Just "hidden", _) -> return Nothing`); any other value (e.g.
`Hidden`, `hiddn`) falls into `roundCoord`, whose catch-all treats unknown
values as `city` (~10 km rounding) — publishing coordinates the author
meant to suppress. Contradicts the file's own privacy comment (lines
287-289) and the fail-closed precedent for `visibility:` in
`build/Archive.hs:77-83`.
### 3.9 Archive state is process-lifetime cached — `watch` goes stale — **LOW**
`build/ArchiveIndex.hs:123-146` + `build/Archive.hs:304`.
`activeUrls`/`rawIndex`/`rawState` are NOINLINE `unsafePerformIO` CAFs read
once per process, and `archiveRules` reads the manifest in `preprocess`.
Under `site watch`, edits to `manifest.yaml`, `removed.yaml`, or the
regenerated state JSONs are never re-read until restart. One-shot builds
unaffected.
### 3.10 Pinned pages render raw ISO in `$last-reviewed$` — **LOW**
`build/Stability.hs:166-170`. The git branch formats via `fmtIso`
("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value
verbatim ("2026-05-01") — inconsistent display formatting.
### 3.11 Empty/all-comments `manifest.yaml` halts the build — **LOW**
`build/Archive.hs:158-170`. An empty YAML stream decodes as `Null`, which
fails to parse as `[ManifestEntry]` and takes the `exitFailure` branch —
draining the manifest to zero entries is fatal rather than the empty
archive the absent-file branch supports.
### 3.12 Backlinks `normaliseUrl` misses directory-form canonical URLs — **LOW**
`build/Backlinks.hs:275-281`. Strips `.html` but not
`index.html`/trailing slash: a page routed `essays/foo/index.html` keys as
`/essays/foo/index`, but a body link authored `/essays/foo/` doesn't
match — backlink silently dropped. `build/SimilarLinks.hs:97-99` handles
exactly this case and its comment flags the divergence.
### 3.13 SimilarLinks PDF viewer URL not percent-encoded — **LOW**
`build/SimilarLinks.hs:155-164`.
`viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw`
`escapeHtml` handles HTML metachars only; a path containing `&`, `?`, `#`,
or spaces breaks the `file=` query value.
### 3.14 Photography feed thumbnails only for directory-form entries — **LOW**
`build/Photography.hs:449-453`. `imgTag` requires `isDir`; flat singles
and series children (`<series>/<photo>.md`) get text-only feed entries,
against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and
the feed's deliberate inclusion of series children.
### 3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — **LOW**
`build/Marks.hs:272-278,565`. `computeTrust _ _ = 0` with a comment
claiming the figure "collapses to the bare frame," but
`renderEpistemicFigure` unconditionally calls `renderTrustLabel`, so a
piece with `status:` but no `confidence`/`evidence` (a case MARKS.md:696
says should render) displays a prominent center "0" — indistinguishable
from an authored zero-trust score.
### 3.16 Feature-module NITs
- `build/Catalog.hs:228-235`: two distinct unknown categories render as
adjacent duplicate "Other" sections (equal rank, `groupBy` on raw
string).
- `build/Stats.hs:754-777`: `pageTOC` comment says "nine h2 sections";
lists eleven (matching the eleven rendered).
- `build/SimilarLinks.hs:51-54`: comment says "the template caps the
display"; the code caps it (`take maxSimilar` at line 80).
- `build/Stats.hs:169-171`, `build/Archive.hs:564-569`: "median" is the
upper-median for even-length lists.
- `build/Backlinks.hs:133-153`: protocol-relative `//host/path` URLs pass
`isPageLink` and pollute backlinks.json.
- `build/BibExtras.hs:75-98`: `@string`/`@comment`/`@preamble` blocks
parsed as citekey entries — only consequential on a citekey/macro-name
collision.
Verified clean: Marks tick positions/axis order/radii match MARKS.md §3;
proved-confidence trust substitution matches §4.3; Archive's fail-closed
`visibility` validation, removed.yaml conflict rejection, and double-sided
SHA-256 verification all match ARCHIVE.md.
---
## 4. Python & shell tooling
### 4.1 `data/embed-cache-pages.npz.tmp.npz` orphan: explained; cleanup + ignore gaps — **MED**
The orphan (mtime May 26) is the fossil of a fixed bug: an earlier
embed.py passed a bare path to `np.savez_compressed`, numpy appended
`.npz` (verified in numpy's `_savez` source), and the subsequent
`os.replace` raised FileNotFoundError, stranding the file. The current
file-handle code (`tools/embed.py:173-183`) is correct, but: (a) nothing
deletes the stale orphan — **delete it, don't commit it**; (b) the tmp
write has no try/finally, so any mid-write exception strands
`embed-cache-pages.npz.tmp`; (c) the new `.gitignore` entry is exact-path
(`data/embed-cache-pages.npz`) and covers neither `.tmp` nor `.tmp.npz`
variants — widen to `data/embed-cache-pages.npz*`; (d) the fixed tmp name
means two concurrent runs interleave writes.
### 4.2 Corrupt embed cache crashes instead of being discarded — **MED**
`tools/embed.py:154`. The discard path catches
`(OSError, KeyError, ValueError)`, but `np.load` on a truncated `.npz`
raises `zipfile.BadZipFile` (verified MRO: `BadZipFile → Exception`), and
`EOFError` is also uncaught. A half-written cache (exactly what §4.1(b)
can produce) makes every subsequent build print "Warning: embedding
failed" and leaves similar-links/semantic index stale until the file is
manually deleted — the opposite of the docstring's "unreadable →
discarding" contract.
### 4.3 embed.py staleness check structurally defeated by stamp-build-time — **MED**
`tools/embed.py:195-200` + `Makefile:68`. `needs_update()` compares
`_site/**/*.html` mtimes against embed's outputs — but the build order is
`embed.py``stamp-build-time.py _site`, and the stamper rewrites the
footer timestamp in essentially every HTML file each build. So every page
is always newer than embed's outputs and the "skip if fresh" fast path
never fires: the full paragraph-embedding pass (and model load) runs on
every build. The new page cache papers over half the cost; the paragraph
pass pays full price every time. Related (`tools/embed.py:297-299`):
model/config changes never invalidate outputs — currently masked by this
bug; fixing one exposes the other.
### 4.4 archive.py writes provenance/index/state non-atomically — **MED**
`tools/archive.py:718-721,734-737,953-957,1077-1080`. All plain
`write_text()`. An interrupt mid-write truncates `PROVENANCE.json`; the
next build's `json.loads` (line 642) raises an unhandled
`JSONDecodeError` — and a truncated provenance is indistinguishable from
corruption in a tool whose whole contract is integrity checking. embed.py
got atomic-write helpers; archive.py did not.
### 4.5 download-leaflet.sh: checksum verification bypassable — **MED**
`tools/download-leaflet.sh:43-47,90`. The early-exit skip checks file
existence only (download-model.sh re-verifies on its skip path), and
`curl -o "$target"` writes directly to the final path: a download that
*fails* `verify_or_warn` aborts via `set -e` *after* the bad file is in
place, and the next run's existence check accepts it permanently. A
MITM'd unpkg.com download survives one failed run and is silently
vendored on the next.
### 4.6 Other download/convert scripts leave partial files in final paths — **LOW**
`tools/download-model.sh:84`: interrupted curl leaves a partial
`model_quantized.onnx`; caught today only because model-checksums.sha256
pins all five files — any unpinned file would persist forever. Use
`-o "$dst.part" && mv`. `tools/convert-images.sh:33`: interrupted cwebp
leaves a partial `.webp` that the `-nt` staleness gate then skips forever
— a truncated WebP ships until manually deleted.
### 4.7 archive.py robustness gaps — **LOW**
- `tools/archive.py:788,795-799`: provenance missing the `artifact` key
makes `prev_artifact == slug_dir`, then `sha256_of` raises an uncaught
`IsADirectoryError` instead of the structured "prior snapshot
incomplete" error.
- `tools/archive.py:614-617,938-940,1066-1068`: non-dict manifest entries
(`- https://example.com` instead of `- url: ...`) crash with
`AttributeError: 'str' object has no attribute 'get'`.
- `tools/archive.py:896`: `wayback_save` concatenates the raw URL
(contrast `wayback_lookup` at 909, which uses `quote(url, safe="")`).
### 4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — **LOW**
`tools/add-popup-source.sh:214`: the connect-src reminder gates on
`[[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]`, but `UPSTREAM_HOST`
is only set in the `NEEDS_PROXY -eq 1` branch (lines 124-131) — the
reminder can never print, and the no-proxy case is exactly when it's
needed (the provider will be CSP-blocked with no hint). Line 71: `NAME`
from a free-text prompt is interpolated into
`location /proxy/$NAME/`/`set $upstream_$NAME` with no
`^[a-z0-9-]+$` validation (import-photo.sh validates; this doesn't).
### 4.9 refreeze.sh deletes the freeze before the replacement succeeds — **LOW**
`tools/refreeze.sh:13-16`. `rm -f "$FREEZE"` then `cabal freeze`; a failed
resolve leaves no freeze file (recoverable via git, but write-temp-then-move
is safer).
### 4.10 embed.py / atomic-write NITs — **LOW/NIT**
`tools/embed.py:109-115`: `atomic_write_bytes` uses a fixed `.tmp` name
(concurrent-run collision) and no `fsync` before `os.replace` (power loss
can leave an empty target). Same pattern in `_atomic_write_yaml` of
extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65.
`tools/embed.py:144`: NpzFile never closed — use
`with np.load(...) as npz:`.
### 4.11 Tooling NITs
- `tools/import-photo.sh:147-155`: on `mogrify -strip` failure the
EXIF-laden JPEG (GPS, serials) remains under `content/`, where
`make build`'s `git add content/` could auto-commit it. Delete `$TARGET`
on that failure path.
- `tools/hooks/pre-commit-marks.sh:28-31`: `awk '{ print $2 }'` truncates
paths with spaces; the `status:` probe reads the working tree, not the
staged blob. Advisory-only hook.
- `tools/preset-signing-passphrase.sh:30`: `echo -n "$PASSPHRASE"` eats a
passphrase starting with `-e`/`-n`/`-E`; use `printf '%s'`.
- `tools/stamp-build-time.py:52-54`: in-place non-atomic rewrite of
`_site/` HTML.
- `tools/archive.py:244`: `pdftotext` without `--`; a slug starting with
`-` parses as an option. Same in extract-exif.py:159.
- `tools/monolith-version.txt` records a sha256 (matches the binary
today, verified) but `find_monolith()` never checks it.
Verified clean: sign-site.sh (atomic sig writes, post-pass manifest
verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT
trap, hash verified before extraction); audit-marks.py, viz_theme.py,
extract-dimensions.py, extract-palette.py; embed.py's faiss `-1` padding
is safely filtered; `uv lock --check` passes; model-checksums.sha256 pins
all five model files.
---
## 5. Frontend JavaScript
### 5.1 Score-reader pages never restore theme/settings — **MED**
`templates/score-reader-default.html:10` + `static/js/theme.js:12-13`. The
template loads `theme.js` without `utils.js` (unlike head.html:66-67), so
`window.lnUtils.safeStorage` is undefined and theme/text-size/focus-mode/
reduce-motion all silently fail to restore — a dark-theme user gets a
light flash-and-stay on every score page. Compounding: settings.js (line
15; the template does render the settings toggle) falls back to its no-op
store, so theme picks made on score pages never persist either.
### 5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — **MED**
`static/js/search-filters.js:117-125`. `normUrl()` returns `u.pathname`
verbatim and looks it up in `epistemicMeta[url]`. Verified:
`_site/data/epistemic-meta.json` keys include
`/essays/beyond-comorbidity-indices/index.html` while rendered result
links use `/essays/beyond-comorbidity-indices/`. The lookup misses,
`passes(null)` returns true ("no metadata = don't filter"), so every
directory-style page bypasses all active epistemic filters. Flat `.html`
pages match fine, which hides the bug.
### 5.3 viz.js ignores the cappuccino theme — **MED**
`static/js/viz.js:94-99`. `isDark()` knows only
`'dark'`/`'light'`/OS-preference, but theme.js/settings.js support
`'cappuccino'` — a dark-brown theme (`--bg: #553a28`, base.css:203). With
OS-light + cappuccino, charts render the LIGHT config (near-black marks
and axis labels) on a dark background.
### 5.4 collapse.js localStorage keys collide across pages — **MED**
`static/js/collapse.js:44,83`. Key is
`'section-collapsed:' + heading.id` with no pathname namespace (contrast
annotations.js). Pandoc auto-slugs (`#introduction`, `#background`) recur
across essays, so collapsing "Introduction" on one essay collapses it
everywhere. Also uses raw `localStorage` rather than
`lnUtils.safeStorage`.
### 5.5 semantic-search.js: stale-response race + duplicate index fetch — **MED**
`static/js/semantic-search.js:117-144`. `runSearch` has no generation
token; overlapping queries render in promise-resolution order, so an
older query's hits can replace a newer one's (with `setStatus('')`
masking it). `loadIndex()` (42-59) has no in-flight-promise dedup (unlike
`loadModel`'s `loadModelPromise`), so concurrent first searches fetch
`semantic-index.bin` + `semantic-meta.json` twice.
### 5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — **MED**
`static/js/lightbox.js`. Overlay sets `role="dialog"` +
`aria-modal="true"` but has no Tab handling (gallery.js's `trapTab` at
235-257 shows the in-repo pattern) — focus walks into the obscured page.
Trigger images get only a `click` listener and no `tabindex`/keydown, so
keyboard users can't open it; `close()` focuses a non-focusable `<img>`,
which no-ops.
### 5.7 Frontend LOWs
- `static/js/gallery.js:122-125,270-275`: math/score overlay is
click-only (no role/tabindex/keydown); `closeOverlay()` focus-returns
to a non-focusable div — focus drops to `<body>`.
- `static/js/popups.js:478,515`: the Wikipedia provider's
`decodeURIComponent` runs synchronously before the `.catch` attaches —
a malformed percent sequence in a link path throws an uncaught
`URIError` per hover.
- `static/js/popups.js:359,390`: fetched monogram SVG injected via
`innerHTML` unescaped — the single unsanitized path in an otherwise
fully escaped pipeline. Build-authored content, so not exploitable
today; the comment acknowledges the trust assumption.
- `static/js/citations.js`: dead file — no template loads it; popups.js
supersedes it. If ever re-added it would double-bind and inject
bibliography innerHTML without popups.js's cloned-node hardening.
Delete.
- `static/js/nav.js:26,30-31`: raw `localStorage` unguarded; if storage
access throws, the throw lands before `toggle.addEventListener`,
leaving the Portals toggle completely dead (utils.js exists precisely
for this).
- `static/js/annotations.js:209-215`: marks are mouse-only; the tooltip's
Delete button is unreachable by keyboard (only recourse is the
all-or-nothing "Clear Annotations").
- `static/js/search.js:10`: unguarded `new PagefindUI(...)` — if the
pagefind bundle 404s, the ReferenceError aborts the whole handler
including the `?q=` pre-fill that the selection-popup "Here" flow
depends on.
- `static/js/semantic-search.js:55-56,96-107`: no
`vectors.length === meta.length * DIM` consistency check — a stale
CDN-cached mismatch yields NaN scores and silently garbage ranking.
(Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.)
- `static/js/transclude.js:149-151` + `collapse.js:111-114`: nested
transcludes render a bare placeholder (no rescan of injected content);
`reinitCollapse` is not idempotent (would stack toggle buttons if ever
called twice on the same container).
- `static/js/popups.js:985-988,1009-1014`: `daysBetween` uses `Math.abs`,
so future dates render "N days ago" (now.js:17 handles this correctly).
### 5.8 Frontend NITs
- `static/js/copy.js:20-22,39`: code-less `<pre>` fallback copies the
"copy" button label along with content.
- `static/js/score-reader.js:50`: URL rewritten to `?p=1` on every load
even without a `?p=` param.
- `static/js/search-filters.js:271`: `parseInt(v,10) || 0` turns junk
threshold input into an active ≥0 filter that matches everything.
- `static/js/selection-popup.js:90-95`: shift-keyup while typing capitals
in the annotation picker re-summons the selection toolbar over it.
Verified clean: the semantic-search ↔ embed.py contract post-model-split
(DIM 384, 818-entry meta, no prefix for MiniLM — the nomic
`search_document:` prefix is confined to the build-only page path); XSS
escaping across semantic-search, popups providers, map tooltips,
annotations (sole exception §5.7 monogram); theme.js ↔ settings.js
storage schema identical; all JS selector contracts against templates
(including the uncommitted head/nav edits); popups/sidenotes
double-init guards; settings.js and gallery.js focus traps.
---
## 6. Templates & content
### 6.1 Draft in undocumented location is never built — **MED**
`content/drafts/inclusionist-manifesto.md`. WRITING.md:34 says drafts go
under `content/drafts/essays/`; `draftEssayPattern`
(`build/Patterns.hs:46-49`) matches only that, so this file is invisible
even to `make watch`/`make dev` — silently orphaned.
### 6.2 SIMD/PQC essay `repository:` URL 404s — **MED**
`content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24`.
`https://git.levineuwirth.org/where-simd-helps` is missing the owner
segment — verified HTTP 404, while the sibling essay's
`.../neuwirth/beyond_comorbidity_indices` returns 200.
### 6.3 Tracked drafts contradict the gitignore policy — **MED**
`.gitignore:88` ignores `content/drafts/` as local-only "working notes,"
but `git ls-files -i -c` shows four tracked drafts
(`digital_progeny.md`, `modern_idolatry.md`, `test-essay.md`,
`university_care.md`) — ignore rules don't untrack, so edits are
auto-staged by `make build` and pushed publicly by deploy. The over-broad
`**/.env.*` pattern also matches the tracked `.env.example`.
### 6.4 Template/content LOWs and NITs
- `content/colophon.md:5`: `modified:` is dead frontmatter — nothing
reads it; `$date-modified$` (page-footer.html:108) is Hakyll's
`dateField` over the `date` key.
- Seven files end frontmatter with a valueless `confidence-history:`
(YAML null; WRITING.md:97 documents a list of ints) — harmless, but
`content/essays/scaling_outage.md` also retains the full WRITING.md
scaffold comments in a published essay.
- `static/images/canto31.jpg`: still 4.0 MB (prior-audit §6.1 unfixed).
- `templates/blog-post.html:25,34`: `id="similar-links"` appears twice in
mutually exclusive `$if$` branches — safe, fragile under edit.
- `content/drafts/essays/digital_progeny.md`: title duplicates the
published "The Specification Dilemma" — stale draft.
- Frontmatter flags `home:`/`library:`/`links:`/`search:`/`portal:` are
consumed (head.html CSS gates, default.html:6 `data-portal`) but
undocumented in WRITING.md.
Verified clean: all `$partial(...)$` includes resolve; all ~140 distinct
template variables have context providers; no missing `alt` attributes,
tag-balance failures, or within-page duplicate IDs in composed pages; all
26 CSS files referenced by head.html exist; sampled enum values across
all sections are legal per WRITING.md and Contexts.hs validation lists.
---
## 7. Documentation / spec drift (WRITING.md, README.md)
### 7.1 `js:` page-script paths documented as content-relative; emitted root-relative — **MED**
`WRITING.md:773-775` vs `templates/default.html:37`
(`<script src="/$script-src$" defer>`). The doc claims a composition's
`js: scripts/widget.js` serves at `/music/symphony/scripts/widget.js`; the
template emits raw root-relative frontmatter. The only current user
(memento-mori) works by coincidence of its root-level route. A
composition following the doc would 404.
### 7.2 "Standalone page `content/my-page/index.md`" has no generic rule — **MED**
`WRITING.md:20` presents directory-form standalone pages as a general
capability; `build/Site.hs` hardcodes only `content/me/index.md` (293) and
`content/memento-mori/index.md` (307); the generic rule (351) matches flat
`content/*.md` only. A new `content/my-page/index.md` silently doesn't
build.
### 7.3 Portal table lists 8 portals; the build has 9 — **MED**
`WRITING.md:221-231` omits Photography, which is in `homePortals`
(`build/Site.hs:50-60`), the nav, and `content/tag-meta/photography.md`.
### 7.4 Three implemented frontmatter fields undocumented — **MED**
WRITING.md:3 claims to cover "all frontmatter fields"; zero hits for:
`summary:` (`build/Contexts.hs:415-427`, rendered by essay.html:16 and
reading.html:12, in live use), `revised:` (`build/Contexts.hs:815`
`getRevisions` — drives `$date-display$`/`$date-original$`/
`$revision-note$` and list sort order), `keywords:`
(`build/Contexts.hs:283` → `/bibliography/<kw>/` links).
### 7.5 Documentation LOWs
- `WRITING.md:268-269,82`: default citation style called "Chicago
Author-Date"; the injected CSL (`build/Citations.hs:114,167-168`) is
`data/chicago-notes.csl`, titled "Chicago Notes Bibliography".
- `README.md:12,19`: `make watch` described as "rebuilds on save without
a server"; it runs Hakyll's preview server (WRITING.md:1139 has it
right).
- `WRITING.md:105-109`: `history:` example ordering contradicts the code
(see §3.5).
---
## 8. nginx, Makefile & deployment
### 8.1 Multi-line CSP value embeds literal `\` + LF bytes — **MED**
`nginx/security-headers.conf:60-71`. The
`Content-Security-Policy-Report-Only` value is a single quoted string
spanning 12 lines with trailing `\` characters — nginx has no
line-continuation inside quoted strings, so the emitted header contains
raw backslash, LF, and leading-space bytes between directives. Raw LF in
a header value is illegal in HTTP/2 (vhost example enables `http2 on`);
strict clients reject the whole response. Sent on every response even as
Report-Only. Must be collapsed to one line.
### 8.2 CSP gaps that will fire under enforcement — **MED**
`nginx/security-headers.conf:66-67`. (a) `font-src 'self' data:` blocks
KaTeX webfonts: head.html:61 loads `katex.min.css` from cdn.jsdelivr.net,
whose relative font URLs resolve to the CDN. (b) `connect-src 'self'`
blocks the onnxruntime `.wasm` that transformers.js v2 (dynamically
imported in `static/js/semantic-search.js:25`) fetches from jsdelivr —
the config comment covers the same-origin model files but not the
runtime. Both latent while Report-Only.
### 8.3 Makefile auto-commit sweeps any pre-staged changes — **MED**
`Makefile:28-29`. `git add content/` followed by
`git diff --cached --quiet || git commit -m "auto: ..."` commits the
*entire index* — anything previously staged gets folded into an
`auto: <timestamp> [skip ci]` commit and pushed publicly on deploy. Use
`git commit -- content/` or verify no foreign paths are staged.
### 8.4 Makefile LOWs
- pdf-thumbs: the `find | while read` pipeline swallows `pdftoppm`
failures (loop exit status is the last iteration's) — a corrupt PDF
silently ships without a thumbnail.
- deploy: prerequisite order `clean build sign` is guaranteed only under
serial make; no `.NOTPARALLEL:` guard for `-j` invocations. (Confirmed:
deploy does run `clean` first; `.PHONY` is complete; `.env` export
allowlist is sound.)
- `tools/hooks/pre-commit-marks.sh` is documented (Makefile:175 comment)
but not installed — `.git/hooks/` has only samples and `core.hooksPath`
is unset.
Verified clean: all seven `data/` JSON/YAML files parse;
`data/embed-cache-pages.npz` is untracked, so the new gitignore entry is
fully effective; nginx archive.conf's add_header-inheritance re-include is
correct; no redirect loops; popup-proxy rate-limit/cache zones correctly
documented for http{} scope.
---
## 9. Working-tree diff review (branding refresh + embed split)
The model contract is **intact** — the diff splits one MiniLM pipeline
into two: pages now use nomic-embed-text-v1.5 (768d, build-only, for
similar-links.json); paragraphs stay on all-MiniLM-L6-v2@c9745ed (384d,
the browser contract). download-model.sh, model-checksums.sha256,
semantic-search.js (`DIM = 384`), and both WRITING.md lines (1108 nomic
for Related-pages, 1128 MiniLM for client search) are all consistent.
Icon declarations all match real files (verified with `file`: apple-touch
180×180, favicon-96 96×96, manifest PNGs 192/512, og-image 1200×630
matching declared og:image dimensions; the webp sidecar was regenerated).
Open items beyond §1.3/§1.4/§4.1:
### 9.1 32.8 KB traced SVG inlined into every page — **MED**
`templates/partials/logo-mark.svg` (32,818 bytes, potrace-style single
giant `<path>`) is inlined via the nav partial into every HTML page —
a ~33 KB per-page weight regression (pre-compression). The two-tone
`--logo-ink`/`--logo-bg` cutout (components.css:72-98) genuinely needs
inline SVG or `<use>`; an external sprite + `<use href>` restores
cacheability. Better still: a hand-drawn or simplified path — a traced
bitmap at nav size carries detail that can never resolve.
### 9.2 Icon asset bloat — **LOW**
`static/favicon.ico` is now 71,766 bytes; parsed directory shows
16/32/48/64/128/256 px entries, the 128+256 pair alone 55.8 KB. The .ico
is only the legacy fallback (modern browsers take the SVG); 16+32+48
(~8 KB) is conventional. `static/favicon.svg` is a 32,844-byte traced
path. `static/images/link-icons/internal.svg` went ~2 KB → 32,818 bytes
yet renders at 0.71.6 rem via CSS mask in three stylesheets
(components.css:853, typography.css:833, popups.css:161).
### 9.3 Webmanifest regressions — **NIT**
`static/site.webmanifest`: `purpose` changed maskable→`any` for both
icons (Android adaptive launchers will letterbox; convention is separate
`any` + `maskable` entries); still no `start_url`/`scope`/`description`
(Lighthouse installability warnings). JSON valid; icons verified.
---
## 10. Prior audit (AUDIT.md 2026-05-07) follow-up
| Finding | Status |
|---|---|
| §1.1 freeze unsolvable | **Effectively still open** — aeson pin fixed, but the freeze broke again via `distributive` after a system update (§1.1 above); the underlying freeze-vs-system-db fragility is unaddressed |
| §1.3 Python version mismatch | Fixed (`requires-python = ">=3.14"` matches `.python-version`) |
| §1.4 model checksums | Fixed (`tools/model-checksums.sha256`, 5 entries) |
| §9.1 nginx headers | Fixed (`nginx/security-headers.conf` + vhost example, README'd) — but see §8.1/§8.2 for new issues in that file |
| §6.1 `canto31.jpg` 4 MB | **Unfixed** |
| robots.txt / sitemap | Fixed (Site.hs:941/963, present in `_site/`) |
| README `paper/`/`spec.md` ghosts | Fixed |
| rsync target quoting | Fixed |
| date-quoting doc | Fixed (WRITING.md:106) |
| tag-meta no-title exception | Fixed (WRITING.md:238-251) |
---
## Suggested triage order
1. ~~`tools/refreeze.sh`~~ (§1.1 — in progress)
2. Delete `data/embed-cache-pages.npz.tmp.npz`; widen the gitignore
pattern; `git add` `logo-mark.svg` + `og-image.png` before committing
the branding diff (§1.4, §4.1)
3. Guard `ArchiveIndex.hs` file reads with `doesFileExist` (§1.2)
4. Pin or sandbox the nomic remote code (§1.3)
5. Fix the `/fiction/``/poetry/` 404s (§2.1) and the production-visible
frontend MEDs (§5.1, §5.2)
6. Collapse the nginx CSP to one line before ever flipping it to
enforcing (§8.1, §8.2)
7. The rest by severity as time allows

View File

@ -1,5 +1,10 @@
.PHONY: build deploy sign download-model download-pdfjs download-leaflet compress-assets convert-images pdf-thumbs pdfs watch clean dev audit-marks archive-gc archive-wayback archive-check
# deploy's prerequisite order (clean -> build -> sign) is only correct
# serially; under `make -j` they could interleave. This build has no
# intra-target parallelism worth preserving, so disable it outright.
.NOTPARALLEL:
# Source .env for deploy / GitHub config if it exists.
# .env format: KEY=value (one per line, no `export` prefix, no quotes needed).
# Only the variables explicitly listed below are exported to recipe
@ -21,8 +26,12 @@ build:
# so a stray secret dropped under content/ is NOT auto-staged. To
# intentionally commit a normally-ignored file, use `git add -f`
# manually before running `make build`.
#
# The commit and its guard are pathspec-limited to content/ so that
# anything the user had previously staged for other reasons is left
# staged, not silently swept into the auto-commit.
@git add content/
@git diff --cached --quiet || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ) [skip ci]"
@git diff --cached --quiet -- content/ || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ) [skip ci]" -- content/
@mkdir -p data
@date +%s > data/build-start.txt
@./tools/convert-images.sh
@ -110,12 +119,16 @@ convert-images:
# Thumbnails are written as static/papers/foo.thumb.png alongside each PDF.
# Skipped silently when pdftoppm is not installed or static/papers/ is empty.
pdf-thumbs:
# A failing pdftoppm must at least warn: the `find | while` pipeline's
# exit status is the last iteration's, so without the `||` a corrupt
# PDF would silently ship without a thumbnail.
@if command -v pdftoppm >/dev/null 2>&1; then \
find static/papers -name '*.pdf' 2>/dev/null | while read pdf; do \
thumb="$${pdf%.pdf}.thumb"; \
if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then \
echo " pdf-thumb $$pdf"; \
pdftoppm -r 100 -f 1 -l 1 -png -singlefile "$$pdf" "$$thumb"; \
pdftoppm -r 100 -f 1 -l 1 -png -singlefile "$$pdf" "$$thumb" \
|| echo "Warning: pdf-thumb failed for $$pdf (page ships without a thumbnail)" >&2; \
fi; \
done; \
else \

View File

@ -9,14 +9,15 @@ with a custom build system in `build/` and a Haskell + JS + Python toolchain.
```sh
make build # one-shot production build into _site/
make dev # dev build (drafts visible) + local server on :8000
make watch # cabal-watch rebuild (drafts visible)
make watch # Hakyll live-reload dev server (drafts visible)
make clean # cabal run site -- clean
make deploy # clean → build → sign → push → rsync to VPS
```
`make build` always runs `make clean` implicitly when invoked from `make deploy`.
For day-to-day work, prefer `make dev` (which serves the site on
`http://localhost:8000`) or `make watch` (rebuilds on save without a server).
`http://localhost:8000`) or `make watch` (Hakyll's live-reload preview server,
which rebuilds on save and serves the site locally).
**Run `make build` any time you add or replace binary assets** (JPEG/PNG
figures, PDFs, music assets). `make dev` and `make watch` skip the

View File

@ -17,15 +17,22 @@ frontmatter fields, and every authoring feature available in the Markdown source
| Fiction | `content/fiction/my-story.md` | `/fiction/my-story.html` |
| Composition | `content/music/{slug}/index.md` | `/music/{slug}/` |
| Standalone page | `content/my-page.md` | `/my-page.html` |
| Standalone page (with co-located assets) | `content/my-page/index.md` | `/my-page.html` |
| Standalone page (with co-located assets; needs a dedicated rule) | `content/me/index.md` | `/me.html` |
| Draft essay | `content/drafts/essays/my-draft.md` | `/drafts/essays/my-draft.html` (dev only) |
File names become URL slugs. Use lowercase, hyphen-separated words.
If a standalone page embeds co-located SVG score fragments or other relative assets,
place it in its own directory (`content/my-page/index.md`) rather than as a flat file.
Score fragment paths are resolved relative to the source file's directory; a flat
`content/my-page.md` would resolve them from `content/`, which is wrong.
Flat `content/<page>.md` is the generic standalone form — any flat file dropped
into `content/` builds automatically. Directory-form standalone pages
(`content/my-page/index.md`) are **not** picked up by the generic rule; each one
requires its own dedicated `match` rule in `build/Site.hs`. The two existing
ones are `content/me/index.md` and `content/memento-mori/index.md` — follow
their pattern when adding another.
The directory form exists for pages that embed co-located SVG score fragments
or other relative assets: score fragment paths are resolved relative to the
source file's directory, and a flat `content/my-page.md` would resolve them
from `content/`, which is wrong.
---
@ -65,9 +72,12 @@ subtitle: "An Optional Secondary Line" # optional; rendered below the title in
date: 2026-03-15 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
A one-paragraph description of the piece.
summary: | # optional; rendered in a "Summary" box near the abstract
A structured summary. **Markdown allowed** — bold, lists, multiple paragraphs.
tags: # optional; see Tags section
- nonfiction
- nonfiction/philosophy
keywords: [lattices, simd] # optional; links to /bibliography/<keyword>/ pages (list or comma-separated string)
authors: # optional; overrides the default "Levi Neuwirth" link
- "Levi Neuwirth | /me.html"
- "Collaborator | https://their.site"
@ -79,7 +89,7 @@ further-reading: # optional; see Citations section
- someKey
- anotherKey
bibliography: data/custom.bib # optional; overrides data/bibliography.bib
csl: data/custom.csl # optional; overrides Chicago Author-Date
csl: data/custom.csl # optional; overrides Chicago Notes Bibliography
no-collapse: true # optional; disables collapsible h2/h3 sections
repository: https://git.levineuwirth.org/levi/repo # optional; "Repository" link in metadata
preprint: /papers/my-essay.pdf # optional; "Preprint" link in metadata (typeset PDF version)
@ -101,12 +111,20 @@ confidence-history: # list of integers; trend arrow derived from last two
peer-status: under-review # optional; unreviewed (default) | under-review | peer-reviewed | published | retracted
result-shape: mixed # optional; positive | negative | mixed | comparative | descriptive
# Version history — optional; falls back to git log, then to date frontmatter
# Version history — optional; falls back to git log, then to date frontmatter.
# Entries may be listed in any order — they are sorted by date at build time.
history:
- date: 2026-03-01 # ISO date; unquoted is fine (the Haskell YAML parser keeps it as a string)
note: Initial draft
- date: 2026-03-14
- date: 2026-03-14 # ISO date; unquoted is fine (the Haskell YAML parser keeps it as a string)
note: Expanded typography section; added citations
- date: 2026-03-01
note: Initial draft
# Revision log — optional; drives the date shown on cards and list pages
# (see Revision dates section)
revised:
- date: "2026-04-10"
note: "expanded the section on typography"
- date: "2026-03-20" # note is optional per-entry
---
```
@ -226,6 +244,7 @@ The top-level segment maps to a **portal** in the nav:
| Miscellany | `/miscellany/` |
| Music | `/music/` |
| Nonfiction | `/nonfiction/` |
| Photography | `/photography/` |
| Poetry | `/poetry/` |
| Research | `/research/` |
| Tech | `/tech/` |
@ -265,7 +284,8 @@ The URL part is optional.
## Citations
The citation pipeline uses Chicago Author-Date style. The bibliography lives at
The citation pipeline uses Chicago Notes Bibliography style
(`data/chicago-notes.csl`). The bibliography lives at
`data/bibliography.bib` (BibLaTeX format) by default; override per-page with
`bibliography` and `csl`.
@ -278,7 +298,7 @@ Multiple sources agree.[@jones2019; @brown2021]
```
Inline citations render as numbered superscripts `[1]`, `[2]`, etc. The
bibliography section appears automatically in the page footer. `citations.js`
bibliography section appears automatically in the page footer. `popups.js`
adds hover previews showing the full reference.
### Further reading
@ -754,9 +774,8 @@ at the top of the catalog.
## Page scripts
For pages that need custom JavaScript (interactive widgets, visualisations, etc.),
place the JS file alongside the content and reference it via the `js:` frontmatter
key. The file is copied to `_site/` and injected as a deferred `<script>` at the
bottom of `<body>`.
reference the JS file via the `js:` frontmatter key. The file is injected as a
deferred `<script>` at the bottom of `<body>`.
```yaml
js: scripts/memento-mori.js # single file
@ -770,12 +789,18 @@ js:
- scripts/widget-b.js
```
Paths are relative to the content file. A composition at
`content/music/symphony/index.md` with `js: scripts/widget.js` serves the
script at `/music/symphony/scripts/widget.js`.
Paths are **site-root-relative**, not relative to the content file: the template
emits the value verbatim with a leading `/` prepended. Write the path without a
leading slash. `js: scripts/widget.js` loads `/scripts/widget.js` regardless of
where the page lives — a composition at `content/music/symphony/index.md` with
that value does *not* get `/music/symphony/scripts/widget.js`.
No changes to the build system are needed — the `content/**/*.js` glob rule
copies all JS files from `content/` to `_site/` automatically.
The script file must live where the build serves that URL. The `content/**/*.js`
glob rule copies JS files to `_site/` with the `content/` prefix stripped, so
`content/scripts/widget.js` is served at `/scripts/widget.js` — this is the
current convention (the memento-mori page keeps its script at
`content/scripts/memento-mori.js` and references it as
`js: scripts/memento-mori.js`).
---
@ -896,7 +921,8 @@ should copy and adapt it; the file documents the §2.2 visual contract
The version history footer section uses a three-tier fallback:
1. **`history:` frontmatter** — your authored notes, shown exactly as written.
1. **`history:` frontmatter** — your authored notes. Entries may be listed in
any order — they are sorted by date at build time.
2. **Git log** — if no `history:` key, dates are extracted from `git log --follow`.
Entries have no message (date only).
3. **`date:` frontmatter** — if git has no commits for the file, falls back to
@ -910,14 +936,50 @@ descriptive:
```yaml
history:
- date: 2026-03-01
note: Initial draft
- date: 2026-03-14
note: Expanded section 3; incorporated feedback from peer review
- date: 2026-03-01
note: Initial draft
```
---
## Revision dates
The `revised:` key records substantive revisions and drives the date shown on
item cards and list pages. Two accepted shapes:
```yaml
revised: "2026-04-10" # scalar shorthand — one revision, no note
revised: # canonical list of objects
- date: "2026-04-10"
note: "expanded the section on Shestov"
- date: "2025-12-03" # note is optional per-entry
```
Dates are ISO `YYYY-MM-DD` strings. Entries may be listed in any order — they
are sorted by date at build time, most recent first. Entries missing `date:`
or carrying non-string values are silently dropped; the build never fails on
a malformed `revised:` block.
Effects:
- **`$date-display$` / `$date-iso$`** — cards and list pages show the
most-recent revision date instead of the creation date.
- **Sort order** — revision-aware lists (`/new.html`, tag pages, the library)
sort by the display date, so a freshly revised piece moves to the top.
- **`$date-original$`** — when the latest revision date differs from the
creation date, the card adds a "revised from …" annotation showing the
original date.
- **`$revision-note$`** — the note on the most-recent entry renders as an
italicized line under the abstract on the card.
`revised:` is independent of `history:` (the version-history footer above);
add a matching `history:` entry if the revision should appear there too.
---
## Typography features
Applied automatically at build time; no markup needed.
@ -1125,7 +1187,7 @@ These pages are built automatically and require no content files or markup:
| Author indexes | `/authors/<slug>/` | All content attributed to an author |
| Random manifest | `/random-pages.json` | JSON array of page URLs for the random-page button |
| Atom feeds | `/feed.xml`, `/music/feed.xml` | All content feed + music-only feed |
| Search | `/search.html` | Pagefind full-text search + client-side semantic search (`nomic-embed-text-v1.5` ONNX model) |
| Search | `/search.html` | Pagefind full-text search + client-side semantic search (`all-MiniLM-L6-v2` ONNX model) |
---

View File

@ -163,10 +163,18 @@ readManifest = do
else do
parsed <- Y.decodeFileEither manifestPath
case parsed of
Right es -> return es
Left e -> do
hPutStrLn stderr $
"[archive] FATAL: manifest.yaml: " ++ show e
-- An empty or all-comments file decodes as YAML @Null@,
-- not as a list. That is the legitimate "drained to zero
-- entries" state, not a broken file — treat it as the
-- empty manifest the absent-file branch already supports.
Right A.Null -> return []
Right v -> case A.fromJSON v of
A.Success es -> return es
A.Error msg -> fatal msg
Left e -> fatal (show e)
where
fatal msg = do
hPutStrLn stderr $ "[archive] FATAL: manifest.yaml: " ++ msg
exitFailure
readRemovedUrls :: IO (Set.Set T.Text)
@ -265,8 +273,17 @@ loadArchiveEntries = do
removed <- readRemovedUrls
validateManifestEntries manifest removed
provByUrl <- readProvenances
-- Join on normalised URLs, like every other URL comparison in the
-- archive system: editing a manifest URL to a normalisation-
-- equivalent form (http->https, trailing slash, tracking params)
-- must keep matching its provenance — an exact-string join would
-- silently unpublish the page while ArchiveIndex's normalised
-- filter keeps links pointing at it. Key collisions can't occur:
-- validateManifestEntries rejects normalised duplicates.
let normKey = T.unpack . normalizeUrl . T.pack
provByNorm = Map.mapKeys normKey provByUrl
fmap catMaybes $ forM manifest $ \me ->
case Map.lookup (meUrl me) provByUrl of
case Map.lookup (normKey (meUrl me)) provByNorm of
Nothing -> return Nothing
Just (slug, pv) -> do
let dir = "archive/" ++ slug
@ -299,6 +316,12 @@ loadArchiveEntries = do
-- ---------------------------------------------------------------------------
-- | All archive rules. Called once from 'Site.rules'.
--
-- The manifest is read here in 'preprocess' (and 'ArchiveIndex' reads
-- its sidecars in once-per-process CAFs), so archive state is fixed at
-- rule-generation time: under @site watch@, edits to @manifest.yaml@,
-- @removed.yaml@, or the regenerated state JSONs are not picked up
-- until the process restarts. One-shot builds are unaffected.
archiveRules :: Rules ()
archiveRules = do
entries <- preprocess loadArchiveEntries
@ -562,10 +585,17 @@ tallyOf xs = intercalate " \183 "
| (k, c) <- Map.toList (Map.fromListWith (+) [ (x, 1 :: Int) | x <- xs ]) ]
-- | The median of a list of ages, as @"N days"@; an em dash when empty.
-- An even-length list takes the mean of the two middle elements,
-- rounded to the nearest whole day.
medianAge :: [Int] -> String
medianAge [] = "\8212"
medianAge xs =
let m = sort xs !! (length xs `div` 2)
let sorted = sort xs
n = length sorted
upper = sorted !! (n `div` 2)
lower = sorted !! (n `div` 2 - 1) -- forced only when n is even
m | odd n = upper
| otherwise = (lower + upper + 1) `div` 2
in show m ++ if m == 1 then " day" else " days"
-- | Parse a @YYYY-MM-DD@ date; 'Nothing' on malformed input.

View File

@ -15,11 +15,18 @@
-- * @Archive@ — surfaces each entry's rot status on its page, the
-- @/archive/@ index, and the @/build/@ telemetry.
--
-- Both files are loaded once per build via @unsafePerformIO@ CAFs. An
-- absent or malformed file degrades safely: an empty index makes the
-- Both files are loaded once per *process* via NOINLINE
-- @unsafePerformIO@ CAFs (as are the manifest/removed URL sets below).
-- An absent or malformed file degrades safely: an empty index makes the
-- link consumers no-op; an absent state file makes every entry @Live@
-- (the safe default — no link flip). @archive.py check@ is decoupled
-- from @make build@; a build consumes whatever state file exists.
--
-- Consequence of the once-per-process read (shared with the manifest
-- read in 'Archive.archiveRules'): under @site watch@, edits to
-- @manifest.yaml@, @removed.yaml@, or the regenerated state JSONs are
-- not re-read — the server renders stale archive state until restart.
-- One-shot builds (@make build@ / @make deploy@) are unaffected.
module ArchiveIndex
( ArchiveStatus (..)
, statusName
@ -132,6 +139,10 @@ activeUrls = unsafePerformIO $ do
{-# NOINLINE rawIndex #-}
rawIndex :: Map Text IdxEntry
rawIndex = unsafePerformIO $ do
exists <- doesFileExist indexPath
if not exists
then return Map.empty
else do
decoded <- A.eitherDecodeFileStrict' indexPath
let parsed = either (const Map.empty) id decoded
return $ Map.filterWithKey
@ -142,6 +153,10 @@ rawIndex = unsafePerformIO $ do
{-# NOINLINE rawState #-}
rawState :: Map Text ArchiveStatus
rawState = unsafePerformIO $ do
exists <- doesFileExist statePath
if not exists
then return Map.empty
else do
decoded <- A.eitherDecodeFileStrict' statePath
return $ either (const Map.empty) (Map.map seStatus) decoded

View File

@ -138,6 +138,8 @@ isPageLink u
| otherwise =
not (T.isPrefixOf "http://" u) &&
not (T.isPrefixOf "https://" u) &&
-- protocol-relative //host/path is external, not a page path
not (T.isPrefixOf "//" u) &&
not (T.isPrefixOf "#" u) &&
not (T.isPrefixOf "mailto:" u) &&
not (T.isPrefixOf "tel:" u) &&
@ -213,18 +215,28 @@ splitSentences = go []
-- For every internal link in a paragraph, emit an entry carrying the HTML
-- of the sentence containing the link (default display) and the HTML of
-- the full paragraph (hover/popup context).
-- Recurses into Div, BlockQuote, BulletList, and OrderedList.
-- Recurses into Div, BlockQuote, BulletList, OrderedList, and
-- DefinitionList. @Plain@ matters as much as @Para@: Pandoc renders
-- tight list items (the default @- item@ Markdown form) as @Plain@
-- blocks, so without it every link written in a tight list would be
-- invisible to the backlinks system.
extractLinksWithContext :: Pandoc -> [LinkEntry]
extractLinksWithContext (Pandoc _ blocks) = concatMap go blocks
where
go :: Block -> [LinkEntry]
go (Para inlines) = paraEntries inlines
go (Plain inlines) = paraEntries inlines
go (BlockQuote bs) = concatMap go bs
go (Div _ bs) = concatMap go bs
go (BulletList items) = concatMap (concatMap go) items
go (OrderedList _ items) = concatMap (concatMap go) items
go (DefinitionList defs) = concatMap defEntries defs
go _ = []
defEntries :: ([Inline], [[Block]]) -> [LinkEntry]
defEntries (term, bodies) =
paraEntries term ++ concatMap (concatMap go) bodies
paraEntries :: [Inline] -> [LinkEntry]
paraEntries inlines =
let paraHtml = renderInlines inlines
@ -268,17 +280,25 @@ linksCompiler = do
-- URL normalisation
-- ---------------------------------------------------------------------------
-- | Normalise an internal URL as a map key: strip query string, fragment,
-- and trailing @.html@; ensure a leading slash; percent-decode the path
-- so that @\/essays\/caf%C3%A9@ and @\/essays\/café@ collide on the same
-- key.
-- | Normalise an internal URL as a map key: strip query string and
-- fragment; ensure a leading slash; strip a trailing @index.html@
-- (keeping the directory slash) before the bare @.html@ extension, so a
-- page routed @essays\/foo\/index.html@ and a body link authored in the
-- canonical directory form @\/essays\/foo\/@ collide on the same key
-- (mirrors 'SimilarLinks.normaliseUrl'); percent-decode the path so that
-- @\/essays\/caf%C3%A9@ and @\/essays\/café@ collide on the same key.
--
-- Both sides of the backlink join go through this function: page keys
-- via 'backlinksFieldWith' (@normaliseUrl ("/" ++ route)@) and link
-- targets via 'targetKey' — so the two always agree.
normaliseUrl :: String -> String
normaliseUrl url =
let t = T.pack url
t1 = fst (T.breakOn "?" (fst (T.breakOn "#" t)))
t2 = if T.isPrefixOf "/" t1 then t1 else "/" `T.append` t1
t3 = fromMaybe t2 (T.stripSuffix ".html" t2)
in percentDecode (T.unpack t3)
t3 = fromMaybe t2 (T.stripSuffix "index.html" t2)
t4 = fromMaybe t3 (T.stripSuffix ".html" t3)
in percentDecode (T.unpack t4)
-- | Decode percent-escapes (@%XX@) into raw bytes, then re-interpret the
-- resulting bytestring as UTF-8. Invalid escapes are passed through

View File

@ -72,6 +72,8 @@ parseBibExtras path = Map.fromList . parseBib <$> readFile' path
-- ---------------------------------------------------------------------------
-- | Enumerate all entries in a .bib file as (citekey, extra) pairs.
-- @\@string@ \/ @\@comment@ \/ @\@preamble@ blocks (case-insensitive)
-- carry no citekey and are skipped wholesale.
parseBib :: String -> [(String, BibExtra)]
parseBib input = go (dropTo '@' input)
where
@ -81,10 +83,17 @@ parseBib input = go (dropTo '@' input)
go [] = []
go ('@':rest) =
let -- Entry type, then '{', then citekey, then ',', then fields, then '}'.
r1 = dropWhile isAlphaNum rest -- skip type name
(typeName, r1) = span isAlphaNum rest
r2 = dropWhile isSpace r1
in case r2 of
'{':r3 ->
'{':r3
-- Not citekey entries: a @string macro name (or the body
-- of a @comment/@preamble) must never be parsed as a
-- citekey. Skip the balanced brace group and carry on.
| map toLower typeName `elem` ["string", "comment", "preamble"] ->
let (_, r4) = readBraces 1 "" r3
in go (dropTo '@' r4)
| otherwise ->
let (citekey, r4) = span (\c -> c /= ',' && not (isSpace c)) r3
r5 = dropWhile (\c -> c /= ',' && c /= '}') r4
in case r5 of

View File

@ -99,7 +99,12 @@ parseCatalogEntry item = do
year = parseYear meta
dur = lookupString "duration" meta
instr = lookupString "instrumentation" meta
cat = fromMaybe "other" (lookupString "category" meta)
-- Fold unknown categories into the canonical "other"
-- bucket here: two distinct unknown values share a rank
-- but would groupBy into separate groups, rendering as
-- adjacent duplicate "Other" sections.
rawCat = fromMaybe "other" (lookupString "category" meta)
cat = if rawCat `elem` categoryOrder then rawCat else "other"
return $ Just CatalogEntry
{ ceTitle = title
, ceUrl = url

View File

@ -9,7 +9,8 @@ module Commonplace
import Data.Aeson (FromJSON (..), withObject, (.:), (.:?), (.!=))
import Data.List (nub, sortBy)
import Data.Ord (comparing, Down (..))
import qualified Data.ByteString.Char8 as BS
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.Yaml as Y
import Hakyll hiding (escapeHtml, renderTags)
import Contexts (siteCtx)
@ -140,7 +141,10 @@ loadCommonplace :: Compiler [CPEntry]
loadCommonplace = do
rawItem <- load (fromFilePath "data/commonplace.yaml") :: Compiler (Item String)
let raw = itemBody rawItem
case Y.decodeEither' (BS.pack raw) of
-- encodeUtf8, not Char8.pack: Char8 truncates each Char to 8 bits,
-- silently corrupting any codepoint above 0x7F (same hazard Now.hs
-- documents — em-dash 0x2014 would become control char 0x14).
case Y.decodeEither' (TE.encodeUtf8 (T.pack raw)) of
Left err -> fail ("commonplace.yaml: " ++ show err)
Right entries -> return entries

View File

@ -22,6 +22,7 @@ module Contexts
, recentFirstByDisplay
, Revision (..)
, getRevisions
, isProvedConfidence
) where
import Data.Aeson (Value (..))
@ -86,7 +87,12 @@ affiliationField = listFieldWith "affiliation-links" ctx $ \item -> do
let entries = case lookupStringList "affiliation" meta of
Just xs -> xs
Nothing -> maybe [] (:[]) (lookupString "affiliation" meta)
return $ map (Item (fromFilePath "") . parseEntry) entries
-- noResult, not an empty list: Hakyll's $if$ treats an empty
-- ListField as truthy, so returning [] would render the wrapper
-- markup (an empty .meta-affiliation row) on every page.
if null entries
then noResult "no affiliation"
else return $ map (Item (fromFilePath "") . parseEntry) entries
where
ctx = field "affiliation-name" (return . fst . itemBody)
<> field "affiliation-url" (\i -> let u = snd (itemBody i)
@ -170,10 +176,17 @@ pageScriptsField = listFieldWith "page-scripts" ctx $ \item -> do
-- | List context field exposing an item's own (non-expanded) tags as
-- @tag-name@ / @tag-url@ objects.
--
-- Fails with 'noResult' when the item has no tags — same discipline
-- as the @Excluding@ variants below — so @$if(...)$@ gates are false
-- and templates don't emit empty tag-wrapper markup.
--
-- $for(essay-tags)$<a href="$tag-url$">$tag-name$</a>$endfor$
tagLinksField :: String -> Context a
tagLinksField fieldName = listFieldWith fieldName ctx $ \item ->
map toItem <$> getTags (itemIdentifier item)
tagLinksField fieldName = listFieldWith fieldName ctx $ \item -> do
ts <- getTags (itemIdentifier item)
if null ts
then noResult "no tags"
else return (map toItem ts)
where
toItem t = Item (fromFilePath (t ++ "/index.html")) t
ctx = field "tag-name" (return . itemBody)
@ -345,7 +358,7 @@ abstractField :: Context String
abstractField = field "abstract" $ \item -> do
meta <- getMetadata (itemIdentifier item)
case lookupString "abstract" meta of
Nothing -> fail "no abstract"
Nothing -> noResult "no abstract"
Just src -> do
let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -379,7 +392,7 @@ descriptionField :: Context String
descriptionField = field "description" $ \item -> do
meta <- getMetadata (itemIdentifier item)
case lookupString "abstract" meta of
Nothing -> fail "no abstract"
Nothing -> noResult "no abstract"
Just src -> do
let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -416,7 +429,7 @@ summaryField :: Context String
summaryField = field "summary" $ \item -> do
meta <- getMetadata (itemIdentifier item)
case lookupString "summary" meta of
Nothing -> fail "no summary"
Nothing -> noResult "no summary"
Just src -> do
let pandocResult = runPure $ do
doc <- readMarkdown defaultHakyllReaderOptions (T.pack src)
@ -462,11 +475,11 @@ bibliographyField = bibContent <> hasCitations
where
bibContent = field "bibliography" $ \item -> do
bib <- itemBody <$> loadSnapshot (itemIdentifier item) "bibliography"
if null bib then fail "no bibliography" else return bib
if null bib then noResult "no bibliography" else return bib
hasCitations = field "has-citations" $ \item -> do
bib <- itemBody <$> (loadSnapshot (itemIdentifier item) "bibliography"
:: Compiler (Item String))
if null bib then fail "no citations" else return "true"
if null bib then noResult "no citations" else return "true"
-- | Further-reading field: loads the further-reading HTML saved by essayCompiler.
-- Returns noResult (making $if(further-reading-refs)$ false) when empty.
@ -474,21 +487,25 @@ furtherReadingField :: Context String
furtherReadingField = field "further-reading-refs" $ \item -> do
fr <- itemBody <$> (loadSnapshot (itemIdentifier item) "further-reading-refs"
:: Compiler (Item String))
if null fr then fail "no further reading" else return fr
if null fr then noResult "no further reading" else return fr
-- ---------------------------------------------------------------------------
-- Epistemic fields
-- ---------------------------------------------------------------------------
-- | Render an integer 15 frontmatter key as filled/empty dot chars.
-- Returns @noResult@ when the key is absent or unparseable.
-- Returns @noResult@ when the key is absent, unparseable, or below 1
-- (a zero would otherwise render five empty circles); values above 5
-- clamp to 5.
dotsField :: String -> String -> Context String
dotsField ctxKey metaKey = field ctxKey $ \item -> do
meta <- getMetadata (itemIdentifier item)
case lookupString metaKey meta >>= readMaybe of
Nothing -> fail (ctxKey ++ ": not set")
Just (n :: Int) ->
let v = max 0 (min 5 n)
Nothing -> noResult (ctxKey ++ ": not set")
Just (n :: Int)
| n < 1 -> noResult (ctxKey ++ ": value below the 1-5 scale")
| otherwise ->
let v = min 5 n
in return (replicate v '\x25CF' ++ replicate (5 - v) '\x25CB')
-- | @$confidence-trend$@: ↑, ↓, or → derived from the last two entries
@ -513,11 +530,11 @@ confidenceTrendField = field "confidence-trend" $ \item -> do
"[Marks] " ++ toFilePath (itemIdentifier item) ++
": confidence: proved is incompatible with confidence-history; ignoring history"
Nothing -> return ()
fail "confidence is proved; trend suppressed"
noResult "confidence is proved; trend suppressed"
else case lookupStringList "confidence-history" meta of
Nothing -> fail "no confidence history"
Nothing -> noResult "no confidence history"
Just xs -> case lastTwo xs of
Nothing -> fail "no confidence history"
Nothing -> noResult "no confidence history"
Just (prevS, curS) ->
let prev = readMaybe prevS :: Maybe Int
cur = readMaybe curS :: Maybe Int
@ -583,7 +600,7 @@ overallScoreField = field "overall-score" $ \item -> do
+ fromIntegral (ev - 1) / 4.0 * 0.4
score = max 0 (min 100 (round (raw * 100.0) :: Int))
in return (show score)
_ -> fail "overall-score: confidence or evidence not set"
_ -> noResult "overall-score: confidence or evidence not set"
-- | @$confidence$@: numeric override that suppresses the @proved@ /
-- @proven@ sentinel. When the frontmatter value is parseable as an
@ -996,7 +1013,7 @@ compositionCtx =
hasScoreField = field "has-score" $ \item -> do
meta <- getMetadata (itemIdentifier item)
let pages = fromMaybe [] (lookupStringList "score-pages" meta)
if null pages then fail "no score pages" else return "true"
if null pages then noResult "no score pages" else return "true"
scorePageCountField = field "score-page-count" $ \item -> do
meta <- getMetadata (itemIdentifier item)
@ -1014,7 +1031,7 @@ compositionCtx =
hasMovementsField = field "has-movements" $ \item -> do
meta <- getMetadata (itemIdentifier item)
if null (parseMovements meta) then fail "no movements" else return "true"
if null (parseMovements meta) then noResult "no movements" else return "true"
movementsListField = listFieldWith "movements" movCtx $ \item -> do
meta <- getMetadata (itemIdentifier item)
@ -1032,9 +1049,9 @@ compositionCtx =
<> field "movement-page" (return . show . movPage . itemBody)
<> field "movement-duration" (return . movDuration . itemBody)
<> field "movement-audio"
(\i -> maybe (fail "no audio") return (movAudio (itemBody i)))
(\i -> maybe (noResult "no audio") return (movAudio (itemBody i)))
<> field "has-audio"
(\i -> maybe (fail "no audio") (const (return "true"))
(\i -> maybe (noResult "no audio") (const (return "true"))
(movAudio (itemBody i)))
-- ---------------------------------------------------------------------------

View File

@ -30,22 +30,45 @@ import Text.Pandoc.Walk (walk)
import ArchiveIndex (ArchiveStatus (..), archiveIndexIsEmpty,
archiveSlugFor, archiveStatusForSlug)
-- | Annotate body links. Headings are left alone — an affordance there
-- would be noise. Identity when the index is empty.
-- | Annotate body links. Links inside headings are left alone at
-- /every/ nesting depth — an affordance there would be noise, and a
-- top-level pattern match would miss a @Header@ inside a @Div@ or
-- @BlockQuote@. Header links are tagged with a sentinel class before
-- the annotation walk and stripped of it afterwards, so the sentinel
-- can never leak into the writer. Identity when the index is empty.
apply :: Pandoc -> Pandoc
apply doc@(Pandoc meta blocks)
apply doc
| archiveIndexIsEmpty = doc
| otherwise = Pandoc meta (map annotateBlock blocks)
| otherwise =
walk unprotectLink . walk annotateInlines . walk protectHeader $ doc
annotateBlock :: Block -> Block
annotateBlock h@Header{} = h
annotateBlock b = walk annotateInlines b
-- | Sentinel class marking a link the annotation walk must skip. It
-- only exists between the protect and unprotect walks inside 'apply'.
skipClass :: T.Text
skipClass = "archive-header-skip"
protectHeader :: Block -> Block
protectHeader (Header lvl attr ils) = Header lvl attr (walk protect ils)
where
protect (Link (ident, cls, kvs) text target) =
Link (ident, skipClass : cls, kvs) text target
protect x = x
protectHeader b = b
unprotectLink :: Inline -> Inline
unprotectLink (Link (ident, cls, kvs) text target)
| skipClass `elem` cls =
Link (ident, filter (/= skipClass) cls, kvs) text target
unprotectLink x = x
-- | For each archived @Link@: flip it if the target is 'Rotted', else
-- append the affordance. Non-archived links pass through untouched.
-- append the affordance. Non-archived links — and links protected by
-- 'protectHeader' — pass through untouched.
annotateInlines :: [Inline] -> [Inline]
annotateInlines = concatMap expand
where
expand l@(Link (_, cls, _) _ _)
| skipClass `elem` cls = [l]
expand l@(Link attr text (url, _)) =
case archiveSlugFor url of
Nothing -> [l]

View File

@ -12,15 +12,23 @@
--
-- The file path must be root-relative (begins with @/@).
-- PDF.js is expected to be vendored at @/pdfjs/web/viewer.html@.
--
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched ('Filters.Wikilinks.mapOutsideFences'),
-- so fenced examples can show @{{pdf:…}}@ literally. Indented code
-- blocks and inline code spans are NOT recognised — a full-line
-- directive inside either is still rewritten.
module Filters.EmbedPdf (preprocess) where
import Data.Char (isDigit)
import Data.List (isPrefixOf, isSuffixOf)
import Filters.Wikilinks (mapOutsideFences)
import qualified Utils as U
-- | Apply PDF-embed substitution to the raw Markdown source string.
-- | Apply PDF-embed substitution to the raw Markdown source string,
-- skipping lines inside fenced code blocks.
preprocess :: String -> String
preprocess = unlines . map processLine . lines
preprocess = mapOutsideFences processLine
processLine :: String -> String
processLine line =

View File

@ -231,7 +231,7 @@ renderPicture :: Attr -> [Inline] -> Target -> Bool -> Maybe (Int, Int) -> Text
renderPicture (ident, classes, kvs) alt (src, title) lightbox dims =
T.concat
[ "<picture>"
, "<source srcset=\"", T.pack webpSrc, "\" type=\"image/webp\">"
, "<source srcset=\"", esc (T.pack webpSrc), "\" type=\"image/webp\">"
, "<img"
, attrId ident
, attrClasses classes

View File

@ -16,8 +16,11 @@ import Text.Pandoc.Definition
import Text.Pandoc.Walk (walk)
-- | Apply link classification to the entire document.
-- Two passes: PDF links first (rewrites href to viewer URL), then external
-- link classification (operates on http/https, so no overlap).
-- Two passes: PDF links first (rewrites href to the viewer URL and tags
-- the anchor @pdf-link@), then general classification. The second pass
-- explicitly skips anchors the PDF pass already claimed — the viewer URL
-- is root-relative, so without that guard it would also be classified as
-- an internal page link and get double chrome.
apply :: Pandoc -> Pandoc
apply = walk classifyLink . walk classifyPdfLink
@ -49,6 +52,11 @@ classifyLink l@(Link (_, classes, _) _ _)
-- brand icon stamp, and have their own popup provider. Leave them
-- entirely alone.
| "source-ref" `elem` classes = l
-- PDF links were already rewritten to the (root-relative) viewer URL
-- and given their own chrome by 'classifyPdfLink' in the preceding
-- pass; without this guard they would be double-classified as
-- internal page links.
| "pdf-link" `elem` classes = l
classifyLink (Link (ident, classes, kvs) ils (url, title))
| isExternal url =
let icon = domainIcon url
@ -100,8 +108,9 @@ isExternal url =
where
siteHost = "levineuwirth.org"
-- | Extract the lowercased hostname from an absolute http(s) URL.
-- Returns 'Nothing' for non-http(s) URLs (relative paths, mailto:, etc.).
-- | Extract the lowercased hostname from an absolute http(s) URL,
-- stripping any userinfo (@user:pass\@@) and port. Returns 'Nothing'
-- for non-http(s) URLs (relative paths, mailto:, etc.).
extractHost :: Text -> Maybe Text
extractHost url
| Just rest <- T.stripPrefix "https://" url = Just (hostOf rest)
@ -109,45 +118,60 @@ extractHost url
| otherwise = Nothing
where
hostOf rest =
let withPort = T.takeWhile (\c -> c /= '/' && c /= '?' && c /= '#') rest
host = T.takeWhile (/= ':') withPort
let authority = T.takeWhile (\c -> c /= '/' && c /= '?' && c /= '#') rest
-- 'T.breakOnEnd' yields the segment after the last @\@@, or
-- the whole authority when there is no userinfo.
(_, hostPort) = T.breakOnEnd "@" authority
host = T.takeWhile (/= ':') hostPort
in T.toLower host
-- | Icon name for the link, matching a file in /images/link-icons/<name>.svg.
--
-- Matches on the URL's host only, never on the full URL — a path like
-- @https://example.org/why-x.com-failed@ must not get the Twitter
-- icon. URLs with no extractable host get the generic icon.
domainIcon :: Text -> Text
domainIcon url
domainIcon url = maybe "external" iconForHost (extractHost url)
iconForHost :: Text -> Text
iconForHost host
-- Scholarly / reference
| "wikipedia.org" `T.isInfixOf` url = "wikipedia"
| "arxiv.org" `T.isInfixOf` url = "arxiv"
| "doi.org" `T.isInfixOf` url = "doi"
| "worldcat.org" `T.isInfixOf` url = "worldcat"
| "orcid.org" `T.isInfixOf` url = "orcid"
| "archive.org" `T.isInfixOf` url = "internet-archive"
| m "wikipedia.org" = "wikipedia"
| m "arxiv.org" = "arxiv"
| m "doi.org" = "doi"
| m "worldcat.org" = "worldcat"
| m "orcid.org" = "orcid"
| m "archive.org" = "internet-archive"
-- Code / software
| "github.com" `T.isInfixOf` url = "github"
| "git.levineuwirth.org" `T.isInfixOf` url = "forgejo"
| "tensorflow.org" `T.isInfixOf` url = "tensorflow"
| m "github.com" = "github"
| m "git.levineuwirth.org" = "forgejo"
| m "tensorflow.org" = "tensorflow"
-- AI companies (consumer products share a brand icon with the lab)
| "anthropic.com" `T.isInfixOf` url = "anthropic"
| "claude.ai" `T.isInfixOf` url = "anthropic"
| "openai.com" `T.isInfixOf` url = "openai"
| "chatgpt.com" `T.isInfixOf` url = "openai"
| m "anthropic.com" = "anthropic"
| m "claude.ai" = "anthropic"
| m "openai.com" = "openai"
| m "chatgpt.com" = "openai"
-- Social / media
| "twitter.com" `T.isInfixOf` url = "twitter"
| "x.com" `T.isInfixOf` url = "twitter"
| "reddit.com" `T.isInfixOf` url = "reddit"
| "youtube.com" `T.isInfixOf` url = "youtube"
| "youtu.be" `T.isInfixOf` url = "youtube"
| "tiktok.com" `T.isInfixOf` url = "tiktok"
| "substack.com" `T.isInfixOf` url = "substack"
| "news.ycombinator.com" `T.isInfixOf` url = "hacker-news"
| "lesswrong.com" `T.isInfixOf` url = "lesswrong"
| m "twitter.com" = "twitter"
| m "x.com" = "twitter"
| m "reddit.com" = "reddit"
| m "youtube.com" = "youtube"
| m "youtu.be" = "youtube"
| m "tiktok.com" = "tiktok"
| m "substack.com" = "substack"
| m "news.ycombinator.com" = "hacker-news"
| m "lesswrong.com" = "lesswrong"
-- News
| "nytimes.com" `T.isInfixOf` url = "new-york-times"
| m "nytimes.com" = "new-york-times"
-- Institutions
| "nasa.gov" `T.isInfixOf` url = "nasa"
| "apple.com" `T.isInfixOf` url = "apple"
| m "nasa.gov" = "nasa"
| m "apple.com" = "apple"
| otherwise = "external"
where
-- Label-suffix match: the host is the domain itself or a subdomain
-- of it. Never fires on a lookalike label (@notx.com@) or on text
-- in the path or query.
m d = host == d || ("." <> d) `T.isSuffixOf` host
-- | Percent-encode characters that would break a @?file=@ query-string value.
-- Slashes are intentionally left unencoded so root-relative paths remain

View File

@ -15,6 +15,7 @@
module Filters.Score (inlineScores) where
import Control.Exception (IOException, try)
import Data.Char (isHexDigit)
import Data.Maybe (listToMaybe)
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
@ -86,25 +87,48 @@ findImagePath blocks = listToMaybe
-- | Replace hardcoded black fill/stroke values with @currentColor@ so the
-- SVG inherits the CSS @color@ property in both light and dark modes.
--
-- 6-digit hex patterns are at the bottom of the composition chain
-- (applied first) so they are replaced before the 3-digit shorthand,
-- preventing partial matches (e.g. @#000@ matching the prefix of @#000000@).
-- Quoted attribute forms (@fill="#000"@) are self-delimiting — the
-- closing quote bounds the match — so plain 'T.replace' is safe for
-- them. Unquoted style-property forms (@fill:#000@) are not: naive
-- replacement would also fire on the prefix of a longer hex colour
-- (@fill:#000080@ → @fill:currentColor80@, invalid CSS). Those go
-- through 'replaceHexColor', which rewrites a match only when it is
-- not followed by another hex digit; the boundary check also makes
-- the 3-digit/6-digit application order irrelevant.
processColors :: T.Text -> T.Text
processColors
-- 3-digit hex and keyword patterns (applied after 6-digit replacements)
-- 3-digit hex and keyword patterns
= T.replace "fill=\"#000\"" "fill=\"currentColor\""
. T.replace "fill=\"black\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000\"" "stroke=\"currentColor\""
. T.replace "stroke=\"black\"" "stroke=\"currentColor\""
. T.replace "fill:#000" "fill:currentColor"
. replaceHexColor "fill:#000" "fill:currentColor"
. T.replace "fill:black" "fill:currentColor"
. T.replace "stroke:#000" "stroke:currentColor"
. replaceHexColor "stroke:#000" "stroke:currentColor"
. T.replace "stroke:black" "stroke:currentColor"
-- 6-digit hex patterns (applied first — bottom of the chain)
. T.replace "fill=\"#000000\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000000\"" "stroke=\"currentColor\""
. T.replace "fill:#000000" "fill:currentColor"
. T.replace "stroke:#000000" "stroke:currentColor"
. replaceHexColor "fill:#000000" "fill:currentColor"
. replaceHexColor "stroke:#000000" "stroke:currentColor"
-- | 'T.replace' restricted to hex-boundary-terminated matches: an
-- occurrence of @needle@ is rewritten only when the character after
-- it is not another hex digit, so @fill:#000@ never fires inside the
-- longer colours @fill:#0008@, @fill:#000080@, or @fill:#00000080@.
replaceHexColor :: T.Text -> T.Text -> T.Text -> T.Text
replaceHexColor needle replacement = go
where
go t =
let (pre, rest) = T.breakOn needle t
in if T.null rest
then pre
else
let after = T.drop (T.length needle) rest
in case T.uncons after of
Just (c, _) | isHexDigit c ->
pre <> needle <> go after
_ -> pre <> replacement <> go after
buildHtml :: Maybe T.Text -> Maybe T.Text -> T.Text -> T.Text
buildHtml mName mCaption svgContent = T.concat

View File

@ -4,12 +4,23 @@
--
-- Each footnote becomes:
-- * A @<sup class="sidenote-ref">@ anchor in the body text.
-- * An @<aside class="sidenote">@ immediately following it, containing
-- * A @<span class="sidenote">@ immediately following it, containing
-- the rendered note content.
--
-- On wide viewports, sidenotes.css floats asides into the right margin.
-- On narrow viewports they are hidden; the standard Pandoc-generated
-- @<section class="footnotes">@ at the document end serves as fallback.
-- Additionally, every consumed note is re-emitted in a
-- @<section class="footnotes">@ appended at the document end. The
-- filter swallows Pandoc's own @Note@ inlines, so Pandoc's writer
-- never produces that section itself — without this re-emission,
-- narrow viewports with JavaScript disabled (where sidenotes.css
-- hides @.sidenote@ and sidenotes.js's bottom sheet never runs)
-- would lose footnote content entirely.
--
-- On wide viewports, sidenotes.css floats the spans into the right
-- margin and hides @section.footnotes@; on narrow viewports the
-- spans are hidden and the section is shown. The in-text anchor
-- targets the footnotes item (the only target visible on narrow
-- no-JS viewports); sidenotes.js intercepts clicks and pairs
-- ref\/note by element id, so the href is purely the no-JS path.
module Filters.Sidenotes (apply) where
import Control.Monad.State.Strict
@ -18,21 +29,58 @@ import Data.Text (Text)
import qualified Data.Text as T
import Text.Pandoc.Class (runPure)
import Text.Pandoc.Definition
import Text.Pandoc.Options (WriterOptions)
import Text.Pandoc.Options (WriterOptions (..),
HTMLMathMethod (KaTeX))
import Text.Pandoc.Walk (walkM)
import Text.Pandoc.Writers.HTML (writeHtml5String)
-- | Transform all @Note@ inlines in the document to inline sidenote HTML.
apply :: Pandoc -> Pandoc
apply doc = evalState (walkM convertNote doc) (1 :: Int)
-- | Accumulator: next label counter plus collected notes
-- (newest-first; reversed before rendering the fallback section).
type NoteState = (Int, [(Text, [Block])])
convertNote :: Inline -> State Int Inline
-- | Transform all @Note@ inlines in the document to inline sidenote
-- HTML, and append the collected notes as a @section.footnotes@
-- fallback block.
apply :: Pandoc -> Pandoc
apply doc =
let (Pandoc m blocks, (_, collected)) =
runState (walkM convertNote doc) (1, [])
notes = reverse collected
in Pandoc m $
if null notes
then blocks
else blocks ++ [footnotesSection notes]
convertNote :: Inline -> State NoteState Inline
convertNote (Note blocks) = do
n <- get
put (n + 1)
(n, acc) <- get
put (n + 1, (toLabel n, blocks) : acc)
return $ RawInline "html" (renderNote n blocks)
convertNote x = return x
-- | The end-of-document fallback list. Letter labels are rendered
-- explicitly (an @<ol>@'s automatic numbering would disagree with
-- the in-text letters), so the list itself is unstyled.
footnotesSection :: [(Text, [Block])] -> Block
footnotesSection notes = RawBlock "html" $ T.concat $
[ "<section class=\"footnotes\" role=\"doc-endnotes\">"
, "<ol class=\"footnotes-list\">"
]
++ map item notes ++
[ "</ol>"
, "</section>"
]
where
item (lbl, blocks) = T.concat
[ "<li id=\"fn-", lbl, "\" class=\"footnote-item\">"
, "<span class=\"footnote-label\" aria-hidden=\"true\">", lbl, "</span>"
, blocksToHtml blocks
, "<a href=\"#snref-", lbl
, "\" class=\"footnote-back\" role=\"doc-backlink\""
, " aria-label=\"Back to reference ", lbl, "\">\x21a9\xfe0e</a>"
, "</li>"
]
-- | Convert a 1-based counter to a letter label using base-26 expansion
-- (Excel-column style): 1→a, 2→b, … 26→z, 27→aa, 28→ab, … 52→az,
-- 53→ba, … 702→zz, 703→aaa. Guarantees a unique label per counter so
@ -53,8 +101,14 @@ renderNote n blocks =
let inner = blocksToInlineHtml blocks
lbl = toLabel n
in T.concat
-- href targets the footnotes-section item: on narrow no-JS
-- viewports that is the only visible rendering of the note
-- (the adjacent .sidenote span is display:none there, and on
-- wide viewports the note is already visible in the margin).
-- sidenotes.js pairs ref/note by id and preventDefaults the
-- click, so the href only ever navigates without JS.
[ "<sup class=\"sidenote-ref\" id=\"snref-", lbl, "\">"
, "<a href=\"#sn-", lbl, "\">", lbl, "</a>"
, "<a href=\"#fn-", lbl, "\">", lbl, "</a>"
, "</sup>"
, "<span class=\"sidenote\" id=\"sn-", lbl, "\">"
, "<sup class=\"sidenote-num\">", lbl, "</sup>\x00a0"
@ -84,16 +138,25 @@ blocksToInlineHtml = T.concat . map renderOne
renderOne b =
blocksToHtml [b]
-- | Writer options for note bodies. Must agree with the math method in
-- 'Compilers.writerOpts' (KaTeX), or math inside a footnote silently
-- degrades to the writer default (PlainMath -> italics) and the
-- client-side KaTeX pass never sees it. Defined locally because
-- importing Compilers from here would create a module cycle
-- (Compilers -> Filters -> Filters.Sidenotes).
noteWriterOpts :: WriterOptions
noteWriterOpts = def { writerHTMLMathMethod = KaTeX "" }
-- | Render a list of inlines to HTML (no surrounding @<p>@).
inlinesToHtml :: [Inline] -> Text
inlinesToHtml inlines =
case runPure (writeHtml5String (def :: WriterOptions) (Pandoc mempty [Plain inlines])) of
case runPure (writeHtml5String noteWriterOpts (Pandoc mempty [Plain inlines])) of
Left _ -> T.empty
Right t -> t
-- | Render a list of Pandoc blocks to an HTML fragment via a pure writer run.
blocksToHtml :: [Block] -> Text
blocksToHtml blocks =
case runPure (writeHtml5String (def :: WriterOptions) (Pandoc mempty blocks)) of
case runPure (writeHtml5String noteWriterOpts (Pandoc mempty blocks)) of
Left _ -> T.empty
Right t -> t

View File

@ -14,7 +14,8 @@
-- extra filter logic is needed for that case.
--
-- The filter is /not/ applied inside headings (where Fira Sans uppercase
-- text looks intentional) or inside @Code@/@RawInline@ inlines.
-- text looks intentional, at any nesting depth — including headings
-- inside divs and block quotes) or inside @Code@/@RawInline@ inlines.
module Filters.Smallcaps (apply) where
import Data.Char (isUpper, isAlpha)
@ -25,13 +26,31 @@ import Text.Pandoc.Walk (walk)
import qualified Utils as U
-- | Apply smallcaps detection to paragraph-level content.
-- Skips heading blocks to avoid false positives.
-- Heading blocks are skipped at /every/ nesting level (a top-level
-- pattern match would miss a @Header@ inside a @Div@ or
-- @BlockQuote@): each header's @Str@ content is swapped for a
-- sentinel 'RawInline' before the wrapping walk and restored
-- afterwards, so 'wrapCaps' can never see it, wherever the header
-- sits in the block tree.
apply :: Pandoc -> Pandoc
apply (Pandoc meta blocks) = Pandoc meta (map applyBlock blocks)
apply = walk restoreStr . walk wrapCaps . walk protectHeader
applyBlock :: Block -> Block
applyBlock b@(Header {}) = b -- leave headings untouched
applyBlock b = walk wrapCaps b
-- | Sentinel format marking a @Str@ that must not be wrapped. It only
-- exists between the protect and restore walks inside 'apply' and
-- can never leak into the writer.
skipFmt :: Format
skipFmt = Format "smallcaps-skip"
protectHeader :: Block -> Block
protectHeader (Header lvl attr ils) = Header lvl attr (walk protectStr ils)
where
protectStr (Str t) = RawInline skipFmt t
protectStr x = x
protectHeader b = b
restoreStr :: Inline -> Inline
restoreStr (RawInline fmt t) | fmt == skipFmt = Str t
restoreStr x = x
-- | Wrap an all-caps Str token in an abbr element, preserving any trailing
-- punctuation (comma, period, colon, semicolon, closing paren/bracket)

View File

@ -19,12 +19,15 @@
-- source-preview rule in 'Site.rules') and renders a
-- syntax-highlighted snippet via Prism.
--
-- Conservative-by-design: the trigger only fires on paths under a
-- short whitelist of top-level directories, or a small set of named
-- root files. This keeps the parser cheap and avoids false positives
-- on words that happen to contain a slash and a dot.
-- Conservative-by-design: the trigger only fires on paths the
-- @/source/@ serving rule actually publishes ('isServedPath', a
-- mirror of @sourcePreviewable@ in 'Site.rules'), or a small set of
-- named root files. This keeps the parser cheap, avoids false
-- positives on words that happen to contain a slash and a dot, and
-- guarantees every wrapped path has a fetchable @/source/…@ copy.
module Filters.SourceRefs (apply, isSourcePath, forgejoSourceUrl) where
import Control.Monad (when)
import Data.IORef (IORef, atomicModifyIORef', newIORef, readIORef)
import qualified Data.Map.Strict as Map
import Data.Text (Text)
@ -94,16 +97,17 @@ classifyExistingLink x = pure x
-- Heuristic
-- ---------------------------------------------------------------------------
-- | True when the text looks like a repo-relative path under one of
-- the whitelisted directories (or is a whitelisted root file), ends
-- in a known source extension, and contains only safe path
-- characters. Conservative by design — the goal is no false
-- positives on prose that incidentally contains a slash and a dot.
-- | True when the text looks like a repo-relative path that the
-- @/source/@ serving rule actually publishes (or is a whitelisted
-- root file), ends in a known source extension, and contains only
-- safe path characters. Conservative by design — the goal is no
-- false positives on prose that incidentally contains a slash and a
-- dot, and no wrapped path whose popup fetch would 404.
isSourcePath :: Text -> Bool
isSourcePath t = and
[ not (T.null t)
, T.all safeChar t
, (hasKnownPrefix t && hasKnownExt t) || isKnownRootFile t
, (isServedPath t && hasKnownExt t) || isKnownRootFile t
]
where
safeChar c =
@ -112,11 +116,26 @@ isSourcePath t = and
|| ('0' <= c && c <= '9')
|| c == '/' || c == '.' || c == '_' || c == '-' || c == '+'
hasKnownPrefix :: Text -> Bool
hasKnownPrefix t = any (`T.isPrefixOf` t)
[ "build/", "static/", "templates/", "tools/"
, "nginx/", "data/", "content/", "yaml-source/"
-- | Mirror of the @sourcePreviewable@ whitelist in 'Site.rules' (the
-- rule that copies files to @/source/<path>@) — the two must stay
-- aligned so every link this filter emits has a corresponding
-- @/source/…@ target for the popup to fetch. Directories Site.hs
-- does not serve (e.g. @content/@) are deliberately absent here:
-- wrapping them would emit popups that are guaranteed to 404.
isServedPath :: Text -> Bool
isServedPath t = or
[ "build/" `T.isPrefixOf` t && hasExt ".hs"
, "static/js/" `T.isPrefixOf` t
, "static/css/" `T.isPrefixOf` t
, "templates/" `T.isPrefixOf` t
, "tools/" `T.isPrefixOf` t && (hasExt ".sh" || hasExt ".py")
, "nginx/" `T.isPrefixOf` t && hasExt ".conf"
, "data/" `T.isPrefixOf` t
&& not ("/" `T.isInfixOf` T.drop 5 t) -- top-level data files only
&& (hasExt ".json" || hasExt ".yaml" || hasExt ".md" || hasExt ".bib")
]
where
hasExt e = e `T.isSuffixOf` T.toLower t
hasKnownExt :: Text -> Bool
hasKnownExt t =
@ -125,7 +144,7 @@ hasKnownExt t =
[ ".hs", ".js", ".mjs", ".css", ".html"
, ".py", ".cabal", ".md", ".yaml", ".yml"
, ".toml", ".sh", ".bash", ".svg", ".conf"
, ".json", ".ini", ".tex"
, ".json", ".ini", ".tex", ".bib"
]
isKnownRootFile :: Text -> Bool
@ -142,14 +161,19 @@ isKnownRootFile t = t `elem`
-- File existence cache
-- ---------------------------------------------------------------------------
-- | Process-wide memo of @doesFileExist@ results, keyed by the same
-- path the popup will fetch. Hakyll runs this filter once per
-- compiled page and the same source-file references recur across
-- | Process-wide memo of /positive/ @doesFileExist@ results, keyed by
-- the same path the popup will fetch. Hakyll runs this filter once
-- per compiled page and the same source-file references recur across
-- many pages (e.g. @build\/Filters\/Links.hs@ in the Links page,
-- the Colophon, several essays); the cache turns N stats into one
-- per distinct path. The build process's working directory is the
-- project root, so the path can be passed straight to
-- 'doesFileExist' without prefixing.
-- per distinct path. Only existence is memoized: a missing file is
-- re-stat'ed on every miss, so a source file created during a
-- long-lived @make watch@ session is picked up on the next rebuild
-- instead of staying "absent" for the process lifetime. (A file
-- /deleted/ mid-watch stays cached as present until restart — the
-- benign direction: the popup fetch 404s and simply never appears.)
-- The build process's working directory is the project root, so the
-- path can be passed straight to 'doesFileExist' without prefixing.
{-# NOINLINE existsCacheRef #-}
existsCacheRef :: IORef (Map.Map Text Bool)
existsCacheRef = unsafePerformIO (newIORef Map.empty)
@ -161,6 +185,7 @@ existsCached path = do
Just b -> pure b
Nothing -> do
b <- doesFileExist (T.unpack path)
when b $
atomicModifyIORef' existsCacheRef (\m -> (Map.insert path b m, ()))
pure b

View File

@ -5,7 +5,13 @@
-- HTML placeholders that transclude.js resolves at runtime.
--
-- A directive must be the sole content of a line (after trimming) to be
-- replaced — this prevents accidental substitution inside prose or code.
-- replaced — this prevents accidental substitution inside prose.
--
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched ('Filters.Wikilinks.mapOutsideFences'),
-- so fenced examples can show @{{slug}}@ literally. Indented code
-- blocks and inline code spans are NOT recognised — a full-line
-- directive inside either is still rewritten.
--
-- Examples:
-- {{my-essay}} → full-page transclusion of /my-essay.html
@ -14,11 +20,13 @@
module Filters.Transclusion (preprocess) where
import Data.List (isSuffixOf, isPrefixOf, stripPrefix)
import Filters.Wikilinks (mapOutsideFences)
import qualified Utils as U
-- | Apply transclusion substitution to the raw Markdown source string.
-- | Apply transclusion substitution to the raw Markdown source string,
-- skipping lines inside fenced code blocks.
preprocess :: String -> String
preprocess = unlines . map processLine . lines
preprocess = mapOutsideFences processLine
processLine :: String -> String
processLine line =

View File

@ -37,6 +37,7 @@
module Filters.Viz (inlineViz) where
import Control.Exception (IOException, catch)
import Data.Char (isHexDigit)
import Data.Maybe (fromMaybe)
import qualified Data.Text as T
import System.Directory (doesFileExist)
@ -117,20 +118,47 @@ runScript baseDir attrs =
-- | Replace hardcoded black fill/stroke values with @currentColor@ so the
-- embedded SVG inherits the CSS text colour in both light and dark modes.
--
-- Quoted attribute forms (@fill="#000"@) are self-delimiting — the
-- closing quote bounds the match — so plain 'T.replace' is safe for
-- them. Unquoted style-property forms (@fill:#000@) are not: naive
-- replacement would also fire on the prefix of a longer hex colour
-- (@fill:#000080@ → @fill:currentColor80@, invalid CSS). Those go
-- through 'replaceHexColor', which rewrites a match only when it is
-- not followed by another hex digit.
processColors :: T.Text -> T.Text
processColors
= T.replace "fill=\"#000\"" "fill=\"currentColor\""
. T.replace "fill=\"black\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000\"" "stroke=\"currentColor\""
. T.replace "stroke=\"black\"" "stroke=\"currentColor\""
. T.replace "fill:#000" "fill:currentColor"
. replaceHexColor "fill:#000" "fill:currentColor"
. T.replace "fill:black" "fill:currentColor"
. T.replace "stroke:#000" "stroke:currentColor"
. replaceHexColor "stroke:#000" "stroke:currentColor"
. T.replace "stroke:black" "stroke:currentColor"
. T.replace "fill=\"#000000\"" "fill=\"currentColor\""
. T.replace "stroke=\"#000000\"" "stroke=\"currentColor\""
. T.replace "fill:#000000" "fill:currentColor"
. T.replace "stroke:#000000" "stroke:currentColor"
. replaceHexColor "fill:#000000" "fill:currentColor"
. replaceHexColor "stroke:#000000" "stroke:currentColor"
-- | 'T.replace' restricted to hex-boundary-terminated matches: an
-- occurrence of @needle@ is rewritten only when the character after
-- it is not another hex digit, so @fill:#000@ never fires inside the
-- longer colours @fill:#0008@, @fill:#000080@, or @fill:#00000080@.
-- (Mirrors 'Filters.Score.replaceHexColor'.)
replaceHexColor :: T.Text -> T.Text -> T.Text -> T.Text
replaceHexColor needle replacement = go
where
go t =
let (pre, rest) = T.breakOn needle t
in if T.null rest
then pre
else
let after = T.drop (T.length needle) rest
in case T.uncons after of
Just (c, _) | isHexDigit c ->
pre <> needle <> go after
_ -> pre <> replacement <> go after
-- ---------------------------------------------------------------------------
-- JSON safety for <script> embedding

View File

@ -12,23 +12,129 @@
-- replaced with hyphens, non-alphanumeric characters stripped, and
-- a @.html@ suffix appended so the link resolves identically under
-- the dev server, file:// previews, and nginx in production.
module Filters.Wikilinks (preprocess) where
--
-- Code protection (honest scope): lines inside /fenced/ code blocks
-- are passed through untouched (see 'mapOutsideFences'), and within a
-- line, inline code spans (backtick runs, CommonMark equal-length
-- matching) are skipped — so both fenced and @`inline`@ examples can
-- show @[[…]]@ literally. Indented code blocks and code spans that
-- cross a line break are NOT recognised; a wikilink inside those is
-- still rewritten.
module Filters.Wikilinks (preprocess, mapOutsideFences) where
import Data.Char (isAlphaNum, toLower, isSpace)
import Data.List (intercalate)
import qualified Utils as U
-- | Scan the raw Markdown source for @[[…]]@ wikilinks and replace them
-- with standard Markdown link syntax.
-- with standard Markdown link syntax. Processing is line-by-line and
-- skips fenced code blocks; a wikilink therefore cannot span a line
-- break (which was never a sensible authoring form).
preprocess :: String -> String
preprocess [] = []
preprocess ('[':'[':rest) =
preprocess = mapOutsideFences replaceWikilinks
replaceWikilinks :: String -> String
replaceWikilinks = go
where
go [] = []
-- Inline code span: a backtick run opens a span closed by a run of
-- exactly the same length (CommonMark). Its body passes through
-- verbatim so documentation can quote @`[[…]]`@ literally. An
-- unclosed run is literal text — and then a following @[[…]]@ is
-- genuinely a wikilink, matching how Pandoc will read the line.
go s@('`':_) =
let (run, afterRun) = span (== '`') s
in case codeSpan (length run) afterRun of
Just (body, after) -> run ++ body ++ run ++ go after
Nothing -> run ++ go afterRun
go ('[':'[':rest) =
case break (== ']') rest of
(inner, ']':']':after)
| not (null inner) ->
toMarkdownLink inner ++ preprocess after
_ -> '[' : '[' : preprocess rest
preprocess (c:rest) = c : preprocess rest
toMarkdownLink inner ++ go after
_ -> '[' : '[' : go rest
go (c:rest) = c : go rest
-- @codeSpan n s@: the span body and the remainder after a closing
-- run of exactly @n@ backticks; 'Nothing' when no closer exists on
-- this line.
codeSpan :: Int -> String -> Maybe (String, String)
codeSpan n = loop
where
loop [] = Nothing
loop s@('`':_) =
let (run, rest) = span (== '`') s
in if length run == n
then Just ("", rest)
else prepend run <$> loop rest
loop (c:cs) = prepend [c] <$> loop cs
prepend pre (body, after) = (pre ++ body, after)
-- ---------------------------------------------------------------------------
-- Fence-aware line mapping (shared by all source-level preprocessors)
-- ---------------------------------------------------------------------------
-- | Apply a line transformation to every line that is not part of a
-- fenced code block. Shared by the three source-level preprocessors
-- (wikilinks here, 'Filters.Transclusion', 'Filters.EmbedPdf') so
-- their directive syntax can be quoted literally inside fenced code.
--
-- Fence tracking follows CommonMark: an opener is at most three
-- spaces of indentation followed by a run of at least three backticks
-- or tildes (longer runs allowed); for backtick fences the info
-- string may not contain a backtick. The closer uses the same fence
-- character, a run at least as long as the opener, and nothing but
-- whitespace after it. An unclosed fence extends to the end of the
-- document. Fence delimiter lines themselves pass through untouched.
--
-- Honest scope: only /fenced/ code blocks are protected. Indented
-- code blocks and inline code spans are not recognised here — a
-- directive inside either is still rewritten.
mapOutsideFences :: (String -> String) -> String -> String
mapOutsideFences f = unlines . go Nothing . lines
where
go _ [] = []
go Nothing (l:ls) =
case openingFence l of
Just fence -> l : go (Just fence) ls
Nothing -> f l : go Nothing ls
go st@(Just fence) (l:ls)
| closesFence fence l = l : go Nothing ls
| otherwise = l : go st ls
-- | The fence character and run length of a CommonMark fence opener,
-- or 'Nothing' when the line does not open a fence.
openingFence :: String -> Maybe (Char, Int)
openingFence l = do
rest <- stripFenceIndent l
case rest of
(c:_) | c == '`' || c == '~' ->
let run = takeWhile (== c) rest
n = length run
info = drop n rest
in if n >= 3 && (c == '~' || '`' `notElem` info)
then Just (c, n)
else Nothing
_ -> Nothing
-- | True when the line closes the fence opened by @(c, n)@: the same
-- fence character, a run at least as long as the opener, and only
-- whitespace after it.
closesFence :: (Char, Int) -> String -> Bool
closesFence (c, n) l =
case stripFenceIndent l of
Nothing -> False
Just rest ->
let run = takeWhile (== c) rest
in length run >= n && all isSpace (drop (length run) rest)
-- | Strip up to three leading spaces (the indentation CommonMark allows
-- on a fence line); 'Nothing' for four or more, which would be an
-- indented code block rather than a fence.
stripFenceIndent :: String -> Maybe String
stripFenceIndent l =
let (indent, rest) = span (== ' ') l
in if length indent <= 3 then Just rest else Nothing
-- | Convert the inner content of @[[…]]@ to a Markdown link.
--

View File

@ -230,7 +230,7 @@ data EpistemicData = EpistemicData
, epPeerStatus :: Maybe String -- ^ Validated peer-status slug ('Nothing' when absent / unreviewed / invalid).
, epResultShape :: Maybe String -- ^ Validated result-shape value.
, epStability :: String -- ^ Always one of the five stability labels.
, epTrust :: Int -- ^ Trust score 0100 (60/40 weighted; @proved@ substitutes 100 for confidence).
, epTrust :: Maybe Int -- ^ Trust score 0100 (60/40 weighted; @proved@ substitutes 100 for confidence). 'Nothing' when confidence or evidence is missing — no label is rendered.
}
-- | Read the figure inputs from a Hakyll item's metadata + git history.
@ -267,15 +267,16 @@ readEpistemicData item = do
trimS = trim'
-- | Trust score: the same 60/40 weighted composite of confidence and
-- evidence used by 'Contexts.overallScoreField'. Returns 0 when either
-- input is missing — which is fine for the figure (the polygon and
-- trust label simply collapse to the bare frame).
computeTrust :: Maybe Int -> Maybe Int -> Int
-- evidence used by 'Contexts.overallScoreField'. Returns 'Nothing'
-- when either input is missing — the figure then renders no trust
-- label at all (it collapses to the bare frame), rather than a
-- literal "0" indistinguishable from an authored zero score.
computeTrust :: Maybe Int -> Maybe Int -> Maybe Int
computeTrust (Just c) (Just e) =
let raw :: Double
raw = fromIntegral c / 100.0 * 0.6 + fromIntegral (e - 1) / 4.0 * 0.4
in max 0 (min 100 (round (raw * 100.0)))
computeTrust _ _ = 0
in Just (max 0 (min 100 (round (raw * 100.0))))
computeTrust _ _ = Nothing
-- | Same predicate as 'Contexts.isProvedConfidence' — local copy to keep
-- the module's dependency graph light (Marks → Stability only). The
@ -390,15 +391,16 @@ renderEpistemicFigure d = T.concat
[ "<svg xmlns=\"http://www.w3.org/2000/svg\""
, " viewBox=\"0 0 200 200\""
, " role=\"img\""
, " aria-label=\"Epistemic figure: trust ", T.pack (show (epTrust d))
, ", stability ", T.pack (epStability d), "\">"
, " aria-label=\"Epistemic figure: "
, maybe "" (\t -> "trust " <> T.pack (show t) <> ", ") (epTrust d)
, "stability ", T.pack (epStability d), "\">"
, renderRoundel
, renderGuides
, renderAxes
, renderPolygon d
, renderVertexMarks d
, renderTicks (epStability d) (epPeerStatus d)
, renderTrustLabel (epTrust d)
, maybe "" renderTrustLabel (epTrust d)
, renderResultShape (epResultShape d) (epTrust d)
, "</svg>"
]
@ -578,10 +580,11 @@ renderTrustLabel score = T.concat
, " opacity=\"0.7\">TRUST</text>"
]
-- | Result-shape glyph immediately to the right of the trust score.
renderResultShape :: Maybe String -> Int -> T.Text
-- | Result-shape glyph immediately to the right of the trust score —
-- or centred in its place when no trust score is rendered.
renderResultShape :: Maybe String -> Maybe Int -> T.Text
renderResultShape Nothing _ = ""
renderResultShape (Just shape) score =
renderResultShape (Just shape) mScore =
let glyph = case shape of
"positive" -> "+"
"negative" -> "\x2212" -- minus sign (not hyphen-minus)
@ -589,15 +592,20 @@ renderResultShape (Just shape) score =
"comparative" -> "\x223C" --
"descriptive" -> "\x25A1" -- □
_ -> ""
-- Offset proportional to the trust number's width (digits ≈ 8 px each).
digitCount = length (show score)
-- Offset proportional to the trust number's width (digits ≈ 8 px
-- each); with no trust label the glyph takes the centre itself.
(x, anchor) = case mScore of
Just score ->
let digitCount = length (show score)
offset = fromIntegral digitCount * 4.5 + 3 :: Double
in (fxCenter + offset, "start")
Nothing -> (fxCenter, "middle")
in if T.null (T.pack glyph)
then ""
else T.concat
[ "<text x=\"", ff (fxCenter + offset)
[ "<text x=\"", ff x
, "\" y=\"", ff (fyCenter + 4)
, "\" text-anchor=\"start\""
, "\" text-anchor=\"", anchor, "\""
, " fill=\"currentColor\" stroke=\"none\""
, " font-family=\"Spectral, serif\" font-size=\"16\">"
, T.pack glyph

View File

@ -12,6 +12,7 @@ module Pagination
) where
import Hakyll
import Patterns (blogPattern)
-- | Items per page across most paginated lists (e.g. the blog).
@ -39,7 +40,7 @@ blogPageId n = fromFilePath $ "blog/page/" ++ show n ++ "/index.html"
-- @baseCtx@: site-level context (siteCtx).
blogPaginateRules :: Context String -> Context String -> Rules ()
blogPaginateRules itemCtx baseCtx = do
paginate <- buildPaginateWith sortAndGroup ("content/blog/*.md" .&&. hasNoVersion) blogPageId
paginate <- buildPaginateWith sortAndGroup (blogPattern .&&. hasNoVersion) blogPageId
paginateRules paginate $ \pageNum pat -> do
route idRoute
compile $ do

View File

@ -122,7 +122,14 @@ allWritings :: Pattern
allWritings = essayPattern .||. blogPattern .||. poetryPattern .||. fictionPattern
-- | Every content file the backlinks pass should index. Includes music
-- landing pages and top-level standalone pages, in addition to writings.
-- landing pages and top-level standalone pages, in addition to writings,
-- plus the two directory-form standalone essays (@content/me/index.md@
-- and @content/memento-mori/index.md@) — full essays rendered with
-- backlinks, whose outgoing links must be visible to the link graph.
--
-- Photography is deliberately excluded: photo pages do not render the
-- backlinks block (see 'Contexts.photographyCtx'), and caption-scale
-- entries would add link-graph noise with no consuming surface.
allContent :: Pattern
allContent =
essayPattern
@ -131,6 +138,8 @@ allContent =
.||. fictionPattern
.||. musicPattern
.||. standalonePagesPattern
.||. "content/me/index.md"
.||. "content/memento-mori/index.md"
-- | Content shown on author index pages — essays + blog posts.
-- (Poetry and fiction have their own dedicated indexes and are not

View File

@ -27,7 +27,7 @@ import Data.Maybe (mapMaybe, fromMaybe, catMaybes)
import qualified Data.Set as Set
import Data.Set (Set)
import Data.Ord (Down (..), comparing)
import System.FilePath (takeDirectory, takeFileName, replaceExtension)
import System.FilePath (takeBaseName, takeDirectory, takeFileName, replaceExtension)
import qualified Data.Aeson as Aeson
import Data.Aeson (Value (..), (.=))
import qualified Data.Aeson.KeyMap as KM
@ -305,10 +305,11 @@ stripIndexHtml r
-- * @exact@: 4 decimal places (~10 m)
-- * @km@ : 2 decimal places (~1 km)
-- * @city@ : 1 decimal place (~10 km) — default
-- * other : treated as @city@
-- * other : treated as @city@ (defensive only — 'buildPin' validates
-- the precision and fails closed before consulting this function)
--
-- @hidden@ is handled at the call site by skipping the pin entirely;
-- this function is not consulted in that case.
-- @hidden@ and unrecognised values are handled at the call site by
-- skipping the pin entirely; this function is not consulted then.
roundCoord :: String -> Double -> Double
roundCoord prec x =
let n = case prec of
@ -336,7 +337,10 @@ parseGeo meta = case KM.lookup "geo" meta of
-- | Build a single pin object from a photo entry. Returns 'Nothing'
-- when:
-- * the entry has no @geo:@ frontmatter, or
-- * it has @geo-precision: hidden@, or
-- * @geo-precision:@ is anything other than @exact@/@km@/@city@ —
-- @hidden@ and unrecognised values (typos, wrong case) alike.
-- Failing closed means a typo'd \"hidden\" can never publish
-- coordinates the author meant to suppress.
-- * the entry has no resolvable route (shouldn't happen for
-- photographyPattern items, but be defensive).
buildPin :: Item String -> Compiler (Maybe Value)
@ -345,13 +349,21 @@ buildPin item = do
meta <- getMetadata ident
mRoute <- getRoute ident
case (parseGeo meta, lookupString "geo-precision" meta, mRoute) of
(_, Just "hidden", _) -> return Nothing
(Just (lat, lon), prec, Just r) ->
(Just (lat, lon), prec, Just r)
| maybe True (`elem` ["exact", "km", "city"]) prec ->
let prec' = fromMaybe "city" prec
rLat = roundCoord prec' lat
rLon = roundCoord prec' lon
fp = toFilePath ident
slug = takeFileName (takeDirectory fp)
-- Directory entries (<slug>/index.md) and series children
-- (<series>/<photo>.md) both key assets off the parent
-- directory; a flat single (content/photography/foo.md)
-- has no entry directory, so its slug is its basename and
-- its co-located assets route to /photography/ directly.
isFlat = takeDirectory fp == "content/photography"
&& takeFileName fp /= "index.md"
slug = if isFlat then takeBaseName fp
else takeFileName (takeDirectory fp)
title = fromMaybe slug (lookupString "title" meta)
photo = lookupString "photo" meta
-- Trim trailing "index.html" so the click-through URL
@ -359,7 +371,8 @@ buildPin item = do
url = "/" ++ stripIndexHtml r
thumb = case photo of
Just p | not (null p) ->
"/photography/" ++ slug ++ "/" ++ p
if isFlat then "/photography/" ++ p
else "/photography/" ++ slug ++ "/" ++ p
_ -> ""
captured = lookupString "captured" meta
in return $ Just $ Aeson.object $
@ -443,13 +456,20 @@ photographyFeedDescription = field "description" $ \item -> do
body <- itemBody <$> (loadSnapshot ident "content" :: Compiler (Item String))
meta <- getMetadata ident
let fp = toFilePath ident
isDir = takeFileName fp == "index.md"
-- Same asset-path derivation as 'buildPin': directory entries
-- (<slug>/index.md) and series children (<series>/<photo>.md)
-- both key assets off the parent directory; a flat single
-- (content/photography/foo.md) has no entry directory, so its
-- co-located assets route to /photography/ directly.
isFlat = takeDirectory fp == "content/photography"
&& takeFileName fp /= "index.md"
slug = takeFileName (takeDirectory fp)
photo = lookupString "photo" meta
imgTag = case (isDir, photo) of
(True, Just p) | not (null p) ->
"<p><img src=\"https://levineuwirth.org/photography/"
++ slug ++ "/" ++ p ++ "\" alt=\"\"></p>\n"
imgTag = case lookupString "photo" meta of
Just p | not (null p) ->
let src = if isFlat then "/photography/" ++ p
else "/photography/" ++ slug ++ "/" ++ p
in "<p><img src=\"https://levineuwirth.org"
++ src ++ "\" alt=\"\"></p>\n"
_ -> ""
return (imgTag ++ body)

View File

@ -49,7 +49,8 @@ instance Aeson.FromJSON SimilarEntry where
-- ---------------------------------------------------------------------------
-- | Maximum entries rendered in the "Related" block. The on-disk JSON may
-- contain more (embed.py's TOP_N); the template caps the display.
-- contain more (embed.py's TOP_N); 'similarLinksField' caps the list
-- (@take maxSimilar@) before rendering.
maxSimilar :: Int
maxSimilar = 3
@ -101,10 +102,10 @@ normaliseUrl url =
-- | Percent-decode @%XX@ escapes (UTF-8) so percent-encoded paths
-- collide with their decoded form on map lookup. Mirrors
-- 'Backlinks.percentDecode'; the two implementations are intentionally
-- duplicated because they apply different normalisations *before*
-- decoding (Backlinks strips @.html@ unconditionally; SimilarLinks
-- preserves the trailing-slash form for index pages).
-- 'Backlinks.percentDecode' (and 'Backlinks.normaliseUrl' now applies
-- the same strip-@index.html@-then-@.html@ normalisation as this
-- module); the duplication keeps the two modules dependency-free of
-- each other.
percentDecode :: String -> String
percentDecode = T.unpack . TE.decodeUtf8With TE.lenientDecode . BS.pack . go
where
@ -121,6 +122,25 @@ percentDecode = T.unpack . TE.decodeUtf8With TE.lenientDecode . BS.pack . go
| c >= 'A' && c <= 'F' = Just (fromEnum c - fromEnum 'A' + 10)
| otherwise = Nothing
-- | Percent-encode a string for use as a URI query value: RFC 3986
-- unreserved characters pass through; everything else — including @&@,
-- @?@, @#@, spaces, and non-ASCII text via its UTF-8 bytes — becomes
-- @%XX@. Hand-rolled (the moral equivalent of network-uri's
-- @escapeURIString isUnreserved@) because network-uri is not otherwise
-- a dependency. The output is also HTML-attribute-safe: it contains
-- only unreserved characters and @%XX@ escapes.
percentEncode :: String -> String
percentEncode = concatMap enc . BS.unpack . TE.encodeUtf8 . T.pack
where
enc b
| unreserved b = [toEnum (fromIntegral b)]
| otherwise = ['%', hexDigit (b `div` 16), hexDigit (b `mod` 16)]
unreserved b =
let c = toEnum (fromIntegral b) :: Char
in (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9') || c `elem` ("-._~" :: String)
hexDigit n = "0123456789ABCDEF" !! fromIntegral n
-- ---------------------------------------------------------------------------
-- HTML rendering
-- ---------------------------------------------------------------------------
@ -153,8 +173,14 @@ renderSimilarLinks entries =
++ "</a></li>\n"
renderPdf se =
-- The PDF path becomes the @file=@ query value, so it must be
-- percent-encoded (HTML escaping alone leaves @&@/@?@/@#@/spaces
-- free to break the query). A @#page=N@ fragment stays a fragment
-- of the viewer URL itself — PDF.js reads it from location.hash.
let raw = seUrl se
viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw
(path, frag) = break (== '#') raw
viewerUrl = "/pdfjs/web/viewer.html?file="
++ percentEncode path ++ escapeHtml frag
in "<li class=\"similar-links-item\">"
++ "<a class=\"similar-link pdf-link\""
++ " href=\"" ++ viewerUrl ++ "\""

View File

@ -31,7 +31,7 @@ import Commonplace (commonplaceCtx)
import Now (nowCtx)
import Contexts (siteCtx, essayCtx, postCtx, pageCtx, poetryCtx, fictionCtx, compositionCtx,
contentKindField, recentFirstByDisplay,
tagLinksFieldExcludingTopSegment)
tagLinksFieldExcludingTopSegment, isProvedConfidence)
import qualified Patterns as P
import Photography (photographyRules)
import Tags (buildAllTags, applyTagRules, sidecarIdentifier,
@ -40,7 +40,7 @@ import Pagination (blogPaginateRules)
import Stats (statsRules)
-- | Home-page portal grid order. Canonical ordering authority for every
-- rendering of the eight portals (currently: the home page; future
-- rendering of the portals (currently: the home page; future
-- consumers follow this list). Each entry is (display name, tag name);
-- the tag name is the key to everything else — URL (@/\<tag\>/@),
-- sidecar path (@content\/tag-meta\/\<tag\>.md@), and the Tags.hs
@ -73,13 +73,17 @@ libraryShelfMax = 5
libraryIntroId :: Identifier
libraryIntroId = fromFilePath "content/library.md"
-- Poems inside collection subdirectories, excluding their index pages.
collectionPoems :: Pattern
collectionPoems = "content/poetry/*/*.md" .&&. complement "content/poetry/*/index.md"
-- All poetry content (flat + collection), excluding collection index pages.
allPoetry :: Pattern
allPoetry = "content/poetry/*.md" .||. collectionPoems
-- | Route that strips a literal prefix from the identifier's path.
-- Hakyll's 'gsubRoute' replaces /every/ occurrence of its pattern, so
-- @gsubRoute "content/"@ would also mangle a co-located directory that
-- happened to be named @content@ deeper in the path
-- (@content/essays/slug/content/data.csv@ → @essays/slug/data.csv@).
-- This touches only the leading occurrence; identifiers that don't
-- start with the prefix pass through unchanged.
stripPrefixRoute :: String -> Routes
stripPrefixRoute prefix = customRoute $ \ident ->
let fp = toFilePath ident
in fromMaybe fp (stripPrefix prefix fp)
feedConfig :: FeedConfiguration
feedConfig = FeedConfiguration
@ -168,18 +172,18 @@ rules = do
-- Per-page JS files — authored alongside content in content/**/*.js.
-- Draft JS is handled by a separate dev-only rule below.
match ("content/**/*.js" .&&. complement "content/drafts/**") $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
-- Per-page JS co-located with draft essays (dev-only).
when isDev $ match "content/drafts/**/*.js" $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
-- CSS — must be matched before the broad static/** rule to avoid
-- double-matching (compressCssCompiler vs. copyFileCompiler).
match "static/css/*" $ do
route $ gsubRoute "static/" (const "")
route $ stripPrefixRoute "static/"
compile compressCssCompiler
-- All other static files (fonts, JS, images, …). Build-time
@ -192,7 +196,7 @@ rules = do
.&&. complement "static/**/*.exif.yaml"
.&&. complement "static/**/*.palette.yaml"
) $ do
route $ gsubRoute "static/" (const "")
route $ stripPrefixRoute "static/"
compile copyFileCompiler
-- Templates
@ -299,7 +303,7 @@ rules = do
-- SVG score fragments co-located with me/index.md.
match "content/me/scores/*.svg" $ do
route $ gsubRoute "content/me/" (const "")
route $ stripPrefixRoute "content/me/"
compile copyFileCompiler
-- memento-mori/index.md — lives in its own directory so co-located SVG
@ -315,7 +319,7 @@ rules = do
-- SVG score fragments co-located with memento-mori/index.md.
match "content/memento-mori/scores/*.svg" $ do
route $ gsubRoute "content/memento-mori/" (const "")
route $ stripPrefixRoute "content/memento-mori/"
compile copyFileCompiler
-- ---------------------------------------------------------------------------
@ -354,7 +358,7 @@ rules = do
.&&. complement "content/colophon.md"
.&&. complement "content/current.md"
.&&. complement "content/library.md") $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ pageCompiler
>>= loadAndApplyTemplate "templates/page.html" pageCtx
@ -414,7 +418,7 @@ rules = do
.&&. complement "content/essays/*.md"
.&&. complement "content/essays/*/index.md"
.&&. complement "content/essays/**/*.dims.yaml") $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
-- Static assets co-located with draft essays (dev-only).
@ -422,14 +426,14 @@ rules = do
.&&. complement "content/drafts/essays/*.md"
.&&. complement "content/drafts/essays/*/index.md"
.&&. complement "content/drafts/essays/**/*.dims.yaml") $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
-- ---------------------------------------------------------------------------
-- Blog posts
-- ---------------------------------------------------------------------------
match "content/blog/*.md" $ do
route $ gsubRoute "content/blog/" (const "blog/")
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ postCompiler
>>= saveSnapshot "content"
@ -440,19 +444,12 @@ rules = do
-- ---------------------------------------------------------------------------
-- Poetry
-- ---------------------------------------------------------------------------
-- Flat poems (e.g. content/poetry/sonnet-60.md)
match "content/poetry/*.md" $ do
route $ gsubRoute "content/poetry/" (const "poetry/")
`composeRoutes` setExtension "html"
compile $ poetryCompiler
>>= saveSnapshot "content"
>>= loadAndApplyTemplate "templates/reading.html" poetryCtx
>>= loadAndApplyTemplate "templates/default.html" poetryCtx
>>= relativizeUrls
-- Collection poems (e.g. content/poetry/shakespeare-sonnets/sonnet-1.md)
match collectionPoems $ do
route $ gsubRoute "content/poetry/" (const "poetry/")
-- All poems — flat (content/poetry/sonnet-60.md) and collection
-- (content/poetry/shakespeare-sonnets/sonnet-1.md) forms share one
-- rule; collection index pages are excluded by 'P.poetryPattern'
-- itself and matched separately below.
match P.poetryPattern $ do
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ poetryCompiler
>>= saveSnapshot "content"
@ -462,7 +459,7 @@ rules = do
-- Collection index pages (e.g. content/poetry/shakespeare-sonnets/index.md)
match "content/poetry/*/index.md" $ do
route $ gsubRoute "content/poetry/" (const "poetry/")
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ pageCompiler
>>= loadAndApplyTemplate "templates/default.html" pageCtx
@ -472,7 +469,7 @@ rules = do
-- Fiction
-- ---------------------------------------------------------------------------
match "content/fiction/*.md" $ do
route $ gsubRoute "content/fiction/" (const "fiction/")
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ fictionCompiler
>>= saveSnapshot "content"
@ -496,20 +493,20 @@ rules = do
-- Static assets (SVG score pages, audio, PDF) served unchanged.
match "content/music/**/*.svg" $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
match "content/music/**/*.mp3" $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
match "content/music/**/*.pdf" $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
compile copyFileCompiler
-- Landing page — full essay pipeline.
match "content/music/*/index.md" $ do
route $ gsubRoute "content/" (const "")
route $ stripPrefixRoute "content/"
`composeRoutes` setExtension "html"
compile $ compositionCompiler
>>= saveSnapshot "content"
@ -566,6 +563,46 @@ rules = do
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- ---------------------------------------------------------------------------
-- Poetry index
-- ---------------------------------------------------------------------------
-- Nav, the home portal grid, and the library all link /poetry/; this
-- rule is what keeps those links from 404ing. Lists flat poems and
-- collection poems alike; collection index pages are excluded by
-- 'P.poetryPattern' itself.
create ["poetry/index.html"] $ do
route idRoute
compile $ do
poems <- recentFirst =<< loadAll (P.poetryPattern .&&. hasNoVersion)
let ctx =
listField "essays" poetryCtx (return poems)
<> constField "title" "Poetry"
<> constField "portal" "true"
<> siteCtx
makeItem ""
>>= loadAndApplyTemplate "templates/essay-index.html" ctx
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- ---------------------------------------------------------------------------
-- Fiction index
-- ---------------------------------------------------------------------------
-- Same rationale as the poetry index. content/fiction/ has no entries
-- yet; an empty match list renders an empty index rather than a 404.
create ["fiction/index.html"] $ do
route idRoute
compile $ do
stories <- recentFirst =<< loadAll (P.fictionPattern .&&. hasNoVersion)
let ctx =
listField "essays" fictionCtx (return stories)
<> constField "title" "Fiction"
<> constField "portal" "true"
<> siteCtx
makeItem ""
>>= loadAndApplyTemplate "templates/essay-index.html" ctx
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- ---------------------------------------------------------------------------
-- New page — all content sorted by creation date, newest first
-- ---------------------------------------------------------------------------
@ -573,10 +610,10 @@ rules = do
route idRoute
compile $ do
let allContent = ( allEssays
.||. "content/blog/*.md"
.||. "content/fiction/*.md"
.||. allPoetry
.||. "content/music/*/index.md"
.||. P.blogPattern
.||. P.fictionPattern
.||. P.poetryPattern
.||. P.musicPattern
) .&&. hasNoVersion
items <- recentFirstByDisplay =<< loadAll allContent
let itemCtx = contentKindField
@ -601,7 +638,7 @@ rules = do
-- Library — portal-grouped view over the /new.html dataset, deduplicated
-- by primary portal. An item's primary portal is the top segment of the
-- first tag in its frontmatter 'tags:' list whose top segment matches a
-- known portal (the eight in 'homePortals'). Items with no such tag are
-- known portal (those in 'homePortals'). Items with no such tag are
-- silently dropped from the library (they remain on /new.html and on any
-- tag pages their frontmatter produces).
--
@ -629,9 +666,11 @@ rules = do
-- Top segment of the first tag that names a known portal.
-- Nothing when no tag matches — item is excluded from library.
-- Reads tags via 'getTags' (not lookupStringList) so the
-- scalar comma form ("tags: research, ai") is accepted with
-- the same semantics the tag pages use.
primaryPortalOf item = do
meta <- getMetadata (itemIdentifier item)
let ts = fromMaybe [] (lookupStringList "tags" meta)
ts <- getTags (itemIdentifier item)
return $ listToMaybe
[ p | t <- ts
, let p = takeWhile (/= '/') t
@ -654,12 +693,12 @@ rules = do
-- Load every content item once, then partition by primary portal
-- so each shelf draws from a pre-filtered list rather than
-- re-scanning the whole corpus nine times.
-- re-scanning the whole corpus once per portal.
essays <- loadAll (allEssays .&&. hasNoVersion)
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion)
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion)
poetry <- loadAll (allPoetry .&&. hasNoVersion)
music <- loadAll ("content/music/*/index.md" .&&. hasNoVersion)
posts <- loadAll (P.blogPattern .&&. hasNoVersion)
fiction <- loadAll (P.fictionPattern .&&. hasNoVersion)
poetry <- loadAll (P.poetryPattern .&&. hasNoVersion)
music <- loadAll (P.musicPattern .&&. hasNoVersion)
photos <- loadAll (P.photographyPattern .&&. hasNoVersion)
let allContent = essays ++ posts ++ fiction ++ poetry ++ music ++ photos
:: [Item String]
@ -668,21 +707,30 @@ rules = do
itemsByPortal =
Map.fromListWith (++) [(p, [i]) | (Just p, i) <- tagged]
-- Eager snapshot load registers the library-intro dependency
-- unconditionally, so a first-populate of content/library.md
-- re-renders the library page even when the gate was previously
-- false (see 'sidecarContext' in Tags.hs for the same pattern).
-- Existence-guarded, like the sidecar contexts in Tags.hs:
-- deleting content/library.md degrades to a library page with
-- no intro block rather than failing the whole compile. When
-- the file exists, the eager snapshot load registers the
-- library-intro dependency unconditionally, so a first-populate
-- of content/library.md re-renders the library page even when
-- the gate was previously false (see 'sidecarContext' in
-- Tags.hs for the same pattern).
introIds <- getMatches "content/library.md"
libraryIntroFld <-
if libraryIntroId `elem` introIds
then do
_ <- loadSnapshot libraryIntroId "body" :: Compiler (Item String)
let libraryIntroFld = field "library-intro" $ \_ -> do
return $ field "library-intro" $ \_ -> do
html <- itemBody <$> loadSnapshot libraryIntroId "body"
if all isSpace html
then noResult "empty library intro"
else return html
else return mempty
-- One shelf's context contribution: the @<slug>-entries@
-- listField (or absent via noResult when the shelf is
-- empty) plus an optional @<slug>-has-more@ gate.
portalSection p = do
let portalSection p = do
let portalItems = fromMaybe [] (Map.lookup p itemsByPortal)
sorted <- recentFirstByDisplay portalItems
@ -763,10 +811,10 @@ rules = do
bibKwMap = invertKeywordsBib bibExtrasAll
writingIds <- getMatches $ (P.essayPattern
.||. "content/blog/*.md"
.||. "content/fiction/*.md"
.||. P.blogPattern
.||. P.fictionPattern
.||. P.poetryPattern
.||. "content/music/*/index.md")
.||. P.musicPattern)
.&&. hasNoVersion
writingKwPairs <- forM writingIds $ \ident -> do
@ -863,15 +911,17 @@ rules = do
>>= relativizeUrls
-- ---------------------------------------------------------------------------
-- Random page manifest — essays + blog posts only (no pagination/index pages)
-- Random page manifest — essays, blog posts, fiction, and poetry (flat
-- and collection poems alike). No pagination/index pages; music and
-- photography landings are also excluded.
-- ---------------------------------------------------------------------------
create ["random-pages.json"] $ do
route idRoute
compile $ do
essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll ("content/poetry/*.md" .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll (P.blogPattern .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll (P.fictionPattern .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll (P.poetryPattern .&&. hasNoVersion) :: Compiler [Item String]
routes <- mapM (getRoute . itemIdentifier) (essays ++ posts ++ fiction ++ poetry)
let urls = [ "/" ++ r | Just r <- routes ]
makeItem $ LBS.unpack (Aeson.encode urls)
@ -885,10 +935,10 @@ rules = do
route idRoute
compile $ do
essays <- loadAll (allEssays .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll ("content/blog/*.md" .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll ("content/fiction/*.md" .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll (allPoetry .&&. hasNoVersion) :: Compiler [Item String]
music <- loadAll ("content/music/*/index.md" .&&. hasNoVersion) :: Compiler [Item String]
posts <- loadAll (P.blogPattern .&&. hasNoVersion) :: Compiler [Item String]
fiction <- loadAll (P.fictionPattern .&&. hasNoVersion) :: Compiler [Item String]
poetry <- loadAll (P.poetryPattern .&&. hasNoVersion) :: Compiler [Item String]
music <- loadAll (P.musicPattern .&&. hasNoVersion) :: Compiler [Item String]
let items = essays ++ posts ++ fiction ++ poetry ++ music
pairs <- mapM epistemicEntry items
let metaMap = Map.fromList (catMaybes pairs)
@ -903,10 +953,10 @@ rules = do
posts <- fmap (take 30) . recentFirst
=<< loadAllSnapshots
( ( allEssays
.||. "content/blog/*.md"
.||. "content/fiction/*.md"
.||. allPoetry
.||. "content/music/*/index.md"
.||. P.blogPattern
.||. P.fictionPattern
.||. P.poetryPattern
.||. P.musicPattern
)
.&&. hasNoVersion
)
@ -926,7 +976,7 @@ rules = do
compile $ do
compositions <- recentFirst
=<< loadAllSnapshots
("content/music/*/index.md" .&&. hasNoVersion)
(P.musicPattern .&&. hasNoVersion)
"content"
let feedCtx =
dateField "updated" "%Y-%m-%dT%H:%M:%SZ"
@ -966,10 +1016,10 @@ rules = do
entries <- recentFirst
=<< loadAllSnapshots
( ( allEssays
.||. "content/blog/*.md"
.||. "content/fiction/*.md"
.||. allPoetry
.||. "content/music/*/index.md"
.||. P.blogPattern
.||. P.fictionPattern
.||. P.poetryPattern
.||. P.musicPattern
)
.&&. hasNoVersion
)
@ -1011,8 +1061,12 @@ epistemicEntry item = do
, grab "stability" meta
]
obj = Map.fromList fields
-- Compute overall-score the same way Contexts.overallScoreField does.
obj' = case ( readMaybe =<< lookupString "confidence" meta :: Maybe Int
-- Compute overall-score the same way Contexts.overallScoreField
-- does, including the "proved"/"proven" sentinel -> 100.
confRaw = lookupString "confidence" meta
confInt | isProvedConfidence confRaw = Just 100
| otherwise = readMaybe =<< confRaw :: Maybe Int
obj' = case ( confInt
, readMaybe =<< lookupString "evidence" meta :: Maybe Int
) of
(Just conf, Just ev) ->

View File

@ -33,8 +33,11 @@ import Control.Exception (catch, IOException)
import Data.Aeson (Value (..))
import qualified Data.Aeson.KeyMap as KM
import qualified Data.Vector as V
import Data.List (sortBy)
import Data.Maybe (catMaybes, fromMaybe, listToMaybe)
import Data.Ord (comparing, Down (..))
import Data.Time.Calendar (Day, diffDays)
import Data.Time.Clock (getCurrentTime, utctDay)
import Data.Time.Format (parseTimeM, formatTime, defaultTimeLocale)
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
@ -85,14 +88,8 @@ gitDates fp = do
parseIso :: String -> Maybe Day
parseIso = parseTimeM True defaultTimeLocale "%Y-%m-%d"
-- | Approximate day-span between the oldest and newest ISO date strings.
daySpan :: String -> String -> Int
daySpan oldest newest =
case (parseIso oldest, parseIso newest) of
(Just o, Just n) -> fromIntegral (abs (diffDays n o))
_ -> 0
-- | Derive stability label from commit dates (newest-first).
-- | Derive stability label from commit dates (newest-first), judged as
-- of @today@.
--
-- Thresholds (commit count + age in days since first commit):
--
@ -104,13 +101,18 @@ daySpan oldest newest =
--
-- These cliffs are deliberately conservative: a fast burst of commits
-- early in a piece's life looks volatile until enough time has passed
-- to demonstrate it has settled.
stabilityFromDates :: [String] -> String
stabilityFromDates [] = "volatile"
stabilityFromDates dates@(newest : _) =
-- 'last' is safe: the (newest:_) pattern guarantees non-empty.
classify (length dates) (daySpan (last dates) newest)
-- to demonstrate it has settled. Age is measured from the first commit
-- to /today/, not to the most recent commit — a piece written in a
-- one-week burst must be able to stabilise as quiet time accumulates.
stabilityFromDates :: Day -> [String] -> String
stabilityFromDates _ [] = "volatile"
stabilityFromDates today dates =
classify (length dates) ageDays
where
-- 'last' is safe: the [] case is handled above.
ageDays = case parseIso (last dates) of
Just firstDay -> fromIntegral (diffDays today firstDay)
Nothing -> 0
classify n age
| n <= 1 || age < volatileAge = "volatile"
| n <= 5 && age < revisingAge = "revising"
@ -149,7 +151,9 @@ resolveStability item = do
ignored <- readIgnore
if srcPath `elem` ignored
then return $ fromMaybe "volatile" (lookupString "stability" meta)
else stabilityFromDates <$> gitDates srcPath
else do
today <- utctDay <$> getCurrentTime
stabilityFromDates today <$> gitDates srcPath
-- | Context field @$stability$@.
-- Always resolves to a label; prefers frontmatter when the file is pinned.
@ -166,7 +170,9 @@ lastReviewedField = field "last-reviewed" $ \item -> do
mDate <- unsafeCompiler $ do
ignored <- readIgnore
if srcPath `elem` ignored
then return $ lookupString "last-reviewed" meta
-- Frontmatter convention is ISO; format it like the git
-- branch so pinned pages don't render a raw "2026-05-01".
then return $ fmtIso <$> lookupString "last-reviewed" meta
else fmap fmtIso . listToMaybe <$> gitDates srcPath
case mDate of
Nothing -> fail "no last-reviewed"
@ -228,14 +234,21 @@ versionHistoryHeadCount = 3
-- | Load version-history entries for an item.
-- Priority: frontmatter @history:@ list → git log dates → empty.
--
-- Entries are sorted newest-first by ISO date regardless of authored
-- order: every consumer (primary/rest split, range fields) assumes the
-- head is the newest entry, and the @history:@ list may be authored in
-- either direction. Git dates already arrive newest-first; the sort is
-- idempotent there.
loadVersionHistory :: Item a -> Compiler [VHEntry]
loadVersionHistory item = do
let srcPath = toFilePath (itemIdentifier item)
meta <- getMetadata (itemIdentifier item)
let fmEntries = parseFmHistory meta
let newestFirst = sortBy (comparing (Down . vhDateIso))
fmEntries = newestFirst (parseFmHistory meta)
if not (null fmEntries)
then return fmEntries
else unsafeCompiler (gitLogHistory srcPath)
else unsafeCompiler (newestFirst <$> gitLogHistory srcPath)
-- | Wrap a list of 'VHEntry' as Hakyll Items with unique paths so the
-- list field works correctly inside @$for$@.

View File

@ -156,21 +156,35 @@ stripHtmlTags = go
skipApos (_:rs) = skipApos rs
skipApos [] = []
-- | Normalise a page URL for backlink map lookup (strip trailing .html).
-- | Normalise a page URL for backlink map lookup. Must mirror
-- 'Backlinks.normaliseUrl': strip a trailing @index.html@ (keeping the
-- directory slash) before the bare @.html@ extension, so the keys this
-- produces match the keys written into @data/backlinks.json@.
normUrl :: String -> String
normUrl u
| "index.html" `isSuffixOf` u = take (length u - 10) u
| ".html" `isSuffixOf` u = take (length u - 5) u
| otherwise = u
pad2 :: (Show a, Integral a) => a -> String
pad2 n = if n < 10 then "0" ++ show n else show n
-- | Median of a non-empty list; returns 0 for empty.
-- | Median of a non-empty list; returns 0 for empty. An even-length
-- list takes the mean of the two middle elements, rounded to the
-- nearest unit.
median :: [Int] -> Int
median [] = 0
median xs = sort xs !! (length xs `div` 2)
-- Index is < length xs for non-empty xs, so '(!!)' is safe here
-- by construction. The empty case is caught by the first equation.
median xs
| odd n = upper
| otherwise = (lower + upper + 1) `div` 2
where
-- Indexes are in range for non-empty xs (lower is consulted only
-- when n >= 2), so '(!!)' is safe here by construction. The empty
-- case is caught by the first equation.
sorted = sort xs
n = length sorted
upper = sorted !! (n `div` 2)
lower = sorted !! (n `div` 2 - 1)
-- ---------------------------------------------------------------------------
@ -181,8 +195,11 @@ parseDay :: String -> Maybe Day
parseDay = parseTimeM True defaultTimeLocale "%Y-%m-%d"
-- | First Monday on or before 'day' (start of its ISO week).
-- 'fromEnum' on 'DayOfWeek' is ISO-numbered (Monday=1 .. Sunday=7),
-- so Monday must subtract 0 days, Sunday 6.
weekStart :: Day -> Day
weekStart day = addDays (fromIntegral (negate (fromEnum (dayOfWeek day)))) day
weekStart day =
addDays (fromIntegral (negate (fromEnum (dayOfWeek day) - 1))) day
-- | Intensity class for the heatmap (hm0 … hm4).
heatClass :: Int -> String
@ -297,7 +314,7 @@ renderHeatmap wordsByDay today =
nDays = diffDays today startDay + 1
allDays = [addDays i startDay | i <- [0 .. nDays - 1]]
weekOf d = fromIntegral (diffDays d startDay `div` 7) :: Int
dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6
dowOf d = fromEnum (dayOfWeek d) - 1 -- ISO 1..7 -> Mon=0..Sun=6
svgW = (nWeeks - 1) * step + cellSz
svgH = 6 * step + cellSz + hdrH
@ -752,7 +769,7 @@ renderArchive metrics =
dl [ (k, txt v) | (k, v) <- metrics ]
-- ---------------------------------------------------------------------------
-- Static TOC (matches the nine h2 sections above)
-- Static TOC (matches the eleven h2 sections above)
-- ---------------------------------------------------------------------------
pageTOC :: H.Html

View File

@ -30,16 +30,18 @@ module Tags
) where
import Data.Char (isSpace)
import Data.List (intercalate, isPrefixOf, nub, sort)
import Data.List (intercalate, isPrefixOf, nub, sort, sortBy)
import Data.Maybe (fromMaybe, isNothing, maybeToList)
import Data.Ord (comparing)
import Data.Set (Set)
import qualified Data.Set as Set
import Data.Time.Clock (UTCTime)
import Data.Time.Format (defaultTimeLocale, parseTimeM)
import Hakyll
import Pagination (sortAndGroupAt)
import Patterns (tagIndexable)
import Contexts (abstractField, contentKindField,
recentFirstByDisplay, revisionDateFields,
tagLinksFieldExcludingScope)
import Contexts (Revision (..), abstractField, contentKindField,
getRevisions, recentFirstByDisplay, revisionDateFields,
siteCtx, tagLinksFieldExcludingScope)
-- ---------------------------------------------------------------------------
@ -80,23 +82,23 @@ expandTag t =
-- | Top-level tags that own a section URL outside the tag system, and
-- therefore must NOT be created as tag pages — doing so would
-- collide with a section landing route. The literal @"photography"@
-- is the only one currently affected: every photo's @tags:@ list
-- begins with the bare @"photography"@ portal tag (per the section's
-- convention), and 'tagIdentifier' would route that to
-- @"photography/index.html"@ — already owned by
-- @photographyLandingRules@.
-- collide with a section landing route. Hakyll does not error on
-- duplicate routes (one item silently overwrites the other), so an
-- essay tagged e.g. @music@ would otherwise clobber
-- @music/index.html@. The set therefore lists every namespace that
-- owns a @<name>/index.html@ route, not just the tags currently in
-- use: @photography@ (every photo's @tags:@ list begins with it, per
-- the section convention) plus the other section landings and
-- generated index namespaces.
--
-- Sub-tags (@photography/landscape@, @photography/film@, …) are
-- unaffected; they keep their tag pages because no section landing
-- claims those URLs.
--
-- Other portal tags (@music@, @poetry@, @fiction@, …) don't appear
-- here because their content types don't currently feed
-- 'tagIndexable', so the top-level tag never enters the tag system.
-- Add to this set if that ever changes.
sectionOwnedTopLevelTags :: [String]
sectionOwnedTopLevelTags = ["photography"]
sectionOwnedTopLevelTags =
[ "photography", "poetry", "fiction", "music", "essays", "blog"
, "cv", "archive", "authors", "bibliography"
]
-- | All expanded tags for an item (reads the "tags" metadata field).
-- Filters out any 'sectionOwnedTopLevelTags' to prevent route
@ -293,6 +295,10 @@ sidecarContext sidecarSet tag
-- Provides the fields consumed by @templates/partials/item-card.html@
-- (@$item-kind$@, @$date-iso$@, @$date-created$@, @$abstract$@,
-- @$item-tags$@) with tag-ribbon suppression scoped to the current tag.
--
-- Composes 'siteCtx' (not bare 'defaultContext') so per-item fields
-- the card partial gates on — notably @$has-monogram$@ — fire here
-- the same way they do on /new.html and the library.
tagItemCtx :: String -> Context String
tagItemCtx scope =
contentKindField
@ -301,7 +307,7 @@ tagItemCtx scope =
<> revisionDateFields
<> tagLinksFieldExcludingScope "item-tags" scope
<> abstractField
<> defaultContext
<> siteCtx
-- | Page identifier for a tag index page.
-- Page 1 → <tag>/index.html
@ -359,9 +365,39 @@ clientPaginatedRule tag pat sidecarSet saCtx baseCtx = do
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
-- | Display date of an identifier: the most-recent @revised:@ entry's
-- date when present and parseable, else the creation date. Mirrors
-- the (unexported) @itemDisplayUTC@ behind 'Contexts.recentFirstByDisplay',
-- but needs only 'MonadMetadata' — the paginate grouper runs in
-- 'Rules' over bare 'Identifier's, where no 'Item's exist yet.
identifierDisplayUTC :: (MonadMetadata m, MonadFail m)
=> Identifier -> m UTCTime
identifierDisplayUTC ident = do
meta <- getMetadata ident
case getRevisions meta of
(r:_) | Just utc <- (parseTimeM True defaultTimeLocale "%Y-%m-%d"
(revisionDateISO r) :: Maybe UTCTime)
-> return utc
_ -> getItemUTC defaultTimeLocale ident
-- | Partition identifiers into pages of @n@, most recent first by
-- /display/ date — the same revision-aware key
-- 'recentFirstByDisplay' sorts by within each rendered page — so
-- cross-page ordering is monotone. With creation-date partitioning
-- (plain @sortRecentFirst@), a recently revised old item stayed on a
-- late page but jumped to its top; now it migrates to the early page
-- where its displayed date says it belongs.
sortAndGroupByDisplayAt :: (MonadMetadata m, MonadFail m)
=> Int -> [Identifier] -> m [[Identifier]]
sortAndGroupByDisplayAt n ids = do
keyed <- mapM (\i -> (,) <$> identifierDisplayUTC i <*> pure i) ids
return $ paginateEvery n $ map snd $ sortBy (flip (comparing fst)) keyed
-- | Server-side pagination at 'tagPageSize' per page. Previous/next
-- navigation renders via @templates/partials/paginate-nav.html@;
-- the count toggle operates within the current page only.
-- the count toggle operates within the current page only. Pages are
-- partitioned and sorted by the same display-date key (see
-- 'sortAndGroupByDisplayAt').
serverPaginatedRule :: String
-> Pattern
-> Set Identifier
@ -369,7 +405,7 @@ serverPaginatedRule :: String
-> Context String -- ^ base (siteCtx)
-> Rules ()
serverPaginatedRule tag pat sidecarSet saCtx baseCtx = do
paginate <- buildPaginateWith (sortAndGroupAt tagPageSize) pat (tagPageId tag)
paginate <- buildPaginateWith (sortAndGroupByDisplayAt tagPageSize) pat (tagPageId tag)
paginateRules paginate $ \pageNum pat' -> do
route idRoute
compile $ do

View File

@ -27,9 +27,9 @@ wordCount :: String -> Int
wordCount = length . words
-- | Estimate reading time in minutes (assumes 200 words per minute).
-- Minimum is 1 minute.
-- Rounds up — 399 words is 2 minutes, not 1. Minimum is 1 minute.
readingTime :: String -> Int
readingTime s = max 1 (wordCount s `div` 200)
readingTime s = max 1 ((wordCount s + 199) `div` 200)
-- | Escape HTML special characters: @&@, @<@, @>@, @\"@, @\'@.
--
@ -62,7 +62,11 @@ trim :: String -> String
trim = dropWhileEnd isSpace . dropWhile isSpace
-- | Lowercase a string, drop everything that isn't alphanumeric or
-- space, then replace runs of spaces with single hyphens.
-- space, then replace each space with a hyphen. Note that a run of
-- spaces therefore becomes a run of hyphens (@"A B" → "a--b"@) —
-- deliberately left as-is, since every slug on the site is generated
-- by this one function and collapsing runs now would move existing
-- author URLs.
--
-- Used for author URL slugs (e.g. @"Levi Neuwirth" → "levi-neuwirth"@).
-- Centralised here so 'Authors' and 'Contexts' cannot drift on Unicode

View File

@ -68,8 +68,8 @@ constraints: any.Glob ==0.10.2,
any.deepseq ==1.4.8.1,
any.digest ==0.0.2.1,
any.directory ==1.3.8.5,
any.distributive ==0.6.2.1,
any.djot ==0.1.2.3,
any.distributive ==0.6.3,
any.djot ==0.1.2.4,
any.dlist ==1.0,
any.doclayout ==0.5.0.1,
any.doctemplates ==0.11.0.1,
@ -198,7 +198,7 @@ constraints: any.Glob ==0.10.2,
any.unliftio-core ==0.2.1.0,
any.unordered-containers ==0.2.20.1,
any.utf8-string ==1.0.2,
any.uuid-types ==1.0.6,
any.uuid-types ==1.0.6.1,
any.vault ==0.3.1.6,
any.vector ==0.13.2.0,
any.vector-algorithms ==0.9.1.0,

View File

@ -1,7 +1,6 @@
---
title: Colophon
date: 2026-03-21
modified: 2026-04-27
status: "Durable"
confidence: 93
tags: [meta]

View File

@ -1,23 +0,0 @@
---
title: "The Specification Dilemma"
date: 2026-04-20 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
We should not consider AI entities as mere tools, though they may be the raw foundation from which exceptional tools for thought are constructed to augment the human mind. Rather, we should consider AI as the ultimate distillation and consolidation of humanity's achievements - the ultimate progeny of our civilization.
tags: # optional; see Tags section
- ai
- tech
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 100 # 0100 integer (%)
importance: 5 # 15 integer (rendered as filled/empty dots ●●●○○)
evidence: 1 # 15 integer (same)
scope: civilizational # personal | local | average | broad | civilizational
novelty: idiosyncratic # conventional | moderate | idiosyncratic | innovative
practicality: moderate # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
---
TODO: block quote about Richard Feynman and the beauty of science - idea "it's more beautiful this way"
I have often felt there has been a loss of wonder from the world, and I lament this fact.

View File

@ -1,41 +0,0 @@
---
title: "The Modern Idolatry"
date: 2026-04-06
abstract: >
Thoughts on idolizing notions of success, whether extrinsic or intrinsic, prompted by my upcoming graduation from Brown University and a recent week spent in Paris.
tags:
- miscellany
- philosophy
- personal
- personal/travel
authors:
- "Levi Neuwirth | /me.html"
status: "Draft"
history:
- date: "2026-04-06"
---
Travel affects me profoundly, and the effect is strangely uniform. There is a hierarchical structure of dichotomies that seems to define most aspects of my life, and my interactions with place are no exception to this rule. One of the dichotomies is as follows: I am rather accustomed to moving around in my adult life to date, never spending more than 4 months in a place before spending at least a few weeks somewhere else, and yet I rapidly develop a sense of "home" wherever I am - a stagnation of sorts, an acceptance of the region in which I reside and an abstraction away of the remainder of the world to some vast, estoeric TERRA INCOGNITA. Perhaps the most profound, persistent personal effect of travel on me is that it knocks me out of this mental state of spatial hibernation, reminding me that there is an entire world beyond that which I consistently perceive, and that I have the means to do something to have a positive impact on it. This has been a profoundly important sensation for me to have for many years now, and is thus one basis by which travel is consistently a high priority for me.
This is often combined with a sense of grand melancholy, the sort that for me is nearly ubiquitous in the presence of grandeur and beauty. It is a different incarnation of the same melancholy^[I should emphasize here that while "melancholy" may in general invoke a negative connotation, I do not feel that this is a negative emotion whatsoever. To me, the primary effect of melancholy, or at least melancholy of this sort, is an amplification of the imposing impetus, usually some sense of grandeur. The melancholy is like delicate cinnamon powder added to the top of a pristine flat white.] that I feel when I listen to a profound piece of music, view a painting that I enjoy, or reach the summit of a mountain that I have been embracing for hours. In this case the strength is perhaps yielded by the confluence of grandeur of the natural world - the vastness of space, the mystery of distinct regions that I have yet to know and the warm embrace of returning to those which I know but not well - and that of the human world - the various cultures, languages, beliefs, institutions, and above all people that are present in various places.
This grand, amplified melancholy typically has three causes in my life, two of which I have already mentioned. The third is instances of outward-facing "success" - I typically feel melancholic and pensive when I have done something or crossed some milestone^["Milestones" are not terms that I would use nor guidelines or aspects of some personal timeline or plan, but rather things that society imposes. They don't mean much to me on a personal level, but do unavoidably impact how I feel, since I cannot avoid societial influences as much as I sometimes wish I could.] that many folks see as an indicator of success (or the potential for it). One might imagine, then, that I felt quite a sensation as I was travelling in Paris during my most recent spring break, on the verge of graduating from Brown University after four years of work and extreme personal growth, and such an imagination would be highly warranted. As I took endless walks on the [Champ de Mars](https://en.wikipedia.org/wiki/Champ_de_Mars) and along the [Seine](https://en.wikipedia.org/wiki/Seine) many thoughts and musings were prompted by the grand sensations of emotion, grandeur, and wonder that I felt. They are largely concentrated around the theme of modern idolatry in the name of "success" and the impliciations of this, on both a personal and broader philosophical and societal level. My attempts to collect them into a format that I can share follow.
## Dichotomies
<figure class="prose-excerpt">
<blockquote>
"Everything is a dichotomy; that is perhaps the grandeur of life, of the Universe itself."
</blockquote>
<figcaption>Levi's personal journal, 29 January 2026</figcaption>
</figure>
::: dropcap
What of "success" do I understand, and what of it have I cumulatively failed to understand? Of course, this question depends on one's chosen definition of "success," so perhaps the most interesting approach is to parameterize our choice of definition. Indeed, SUCCESS is a concept that means different things to different people, so perhaps such parameterization is implicitly necessary. Yet such parameterization unsettles me greatly on a personal level. It is the first example of dichotomy that we, together, may explore.
:::
Society widely seems to view success as the fulfillment of goals rooted in extrinsic motivations. The credentialist nature of our society seems to conflate one's ability to earn a title with competence, experience, and, in some cases, worthiness - and who, exactly, is worthy of success, or, rather, is it success that deems one worthy in the eyes of the world? In more ways than one, it seems that we have been conditioned somehow through our institutions, both explicit and implicit, to conflate worthiness with success, and this conflation is perhaps grounded in the idea that success will be transitative; that is, one's continued association with successful people leads to more successful outcomes. This seems to imply that "success" is somehow a communal thing, inherently extrinsic that it diffuses and saturates, so long as those who have it^[For the sake of illustration here we are assuming that "success" is something to be had, a notion that will be debunked later.] are willing to continue associating with those who have less of it.
Yet this is in direct contrast to what is arguably the foundation of our^[I use "our" here to refer to citizens of the United States, my country of birth and the culture that largely influenced my perception of success.] success. The extrinsic nature of such success is not problematic, but the communal aspect is. The ethos of the [American Dream](https://en.wikipedia.org/wiki/American_Dream) is largely that of individualism - the promise that dense individual effort leads to success.

View File

@ -1,236 +0,0 @@
---
title: A Test Essay
date: 2026-03-14
abstract: A comprehensive end-to-end exercise of the Hakyll pipeline — typography, code, math, sidenotes, filters, tables, exhibits, and annotations.
tags: [meta]
affiliation: "Department of Imaginary Systems, University of Nowhere | https://example.com"
status: Working model
confidence: 72
importance: 3
evidence: 2
scope: average
novelty: moderate
practicality: moderate
confidence-history: [55, 63, 72]
history:
- date: "2026-03-01"
note: Initial draft
- date: "2026-03-14"
note: Expanded typography and citation sections; added math examples
---
The body typeface is Spectral, a screen-first serif with seven weights and full OpenType support. Old-style figures are enabled by default: the year 2026, the number 1984, Euler's number 2.718. Standard ligatures are active: *first*, *fifty*, *ffle*. The typographic principles informing this layout draw on Butterick[@butterick2019] and Tufte[@tufte1983]. This document is built with Pandoc[@pandoc].
Paragraphs following one another use first-line indentation in the traditional book manner, with no inter-paragraph vertical gap. This is the second paragraph of the opening section, and you should see the indent at the start of this line.
A third paragraph to confirm the indent is consistent across multiple consecutive paragraphs and does not drift or accumulate.
## Typography
### Headings
Headings are set in Fira Sans Semibold, a humanist sans-serif that complements Spectral. The hierarchy below demonstrates all levels used in practice.
## Section heading (H2)
### Subsection heading (H3)
#### Minor heading (H4)
##### Rarely used (H5)
Body text resumes here, following the heading sequence above. The vertical rhythm above each heading and the transition back to Spectral below it should feel natural, not abrupt.
### Inline Elements
This sentence demonstrates **bold emphasis (700)** and <strong class="semibold">semibold emphasis (600)</strong> side by side — the authorial choice the spec describes. Italic text looks like *this phrase set in Spectral italic*. Combined: ***bold italic***.
Abbreviations use Spectral's true small-caps via the `smcp` OpenType feature: the organisations <abbr title="National Science Foundation">NSF</abbr>, <abbr title="American Civil Liberties Union">ACLU</abbr>, and <abbr title="Central Intelligence Agency">CIA</abbr>. These should appear as genuine small capitals, not scaled-down full caps.
Superscripts use Spectral's `sups` glyphs: E = mc^2^, footnote reference^1^, ordinals like 1^st^ and 2^nd^. Subscripts use `subs`: H~2~O, CO~2~.
Inline code looks like `cabal run site -- build` and sits comfortably in a line of Spectral body text. The size differential and background tint should clearly distinguish it without being jarring.
### Blockquotes
> The site is the proof. If a site about careful writing is itself carelessly made, the argument is self-defeating. Every element must earn its presence.
Text resumes after the blockquote without indent — the indent reset rule is working if this line begins flush left.
> A nested quotation scenario: this outer blockquote contains ordinary text, establishing the left-border visual hierarchy.
## Code
JetBrains Mono is used for all code. Ligatures and contextual alternates are active: `->` `=>` `!=` `::` `>=` in inline code, and in blocks below.
```haskell
-- Hakyll site compiler entry point
module Main where
import Hakyll (hakyll)
import Site (rules)
main :: IO ()
main = hakyll rules
```
```css
/* CSS custom property example */
:root {
--bg: #faf8f4;
--text: #1a1a1a;
}
body {
background-color: var(--bg);
color: var(--text);
font-feature-settings: 'liga' 1, 'onum' 1;
}
```
```python
def greet(name: str) -> str:
return f"Hello, {name}!"
```
The code block border, background tint, and monospaced font should feel quiet — part of the page, not a jarring box.
## Tables
Tables use Fira Sans at 90% size, with lining figures and tabular spacing enabled for numeric alignment.
| Font | Role | Weight(s) | File size |
|:---------------|:----------------|:------------|:----------|
| Spectral | Body text | 400, 600, 700 | 2124 KB |
| Fira Sans | UI / headings | 400, 600 | 16 KB |
| JetBrains Mono | Code | 400 | 1920 KB |
## Dark Mode
Use the toggle in the top-right corner of the nav to switch between light and dark. Both themes use warm monochrome palettes derived from the same base hue. The background, text, borders, muted text, code blocks, and blockquote borders should all shift coherently.
Check the following specifically in dark mode: sidenotes, code block backgrounds, the blockquote border, and the table header row. The `transition` on `body` should make the switch feel smooth rather than abrupt.
- Background: `#1c1a18` (warm dark, not pure black)
- Text: `#e8e5df` (warm off-white, not pure white)
- Muted text, borders: proportionally darker warm greys
## Mathematics
The quadratic formula solves $ax^2 + bx + c = 0$ for real roots:
$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$
This is a well-known result.[^quadratic] Euler's identity is often cited as the most beautiful equation in mathematics:
$$e^{i\pi} + 1 = 0$$
It connects the five most important constants in mathematics.[^euler] The CSS smallcaps filter should catch abbreviations like NASA, HTML, CSS, and API automatically.
[^quadratic]: The formula follows directly from completing the square. For a derivation, see any introductory algebra text, e.g. Stewart's *Precalculus*.
[^euler]: This follows from Euler's formula $e^{i\theta} = \cos\theta + i\sin\theta$ evaluated at $\theta = \pi$.
### Turán's Theorem
The Turán graph $T(n,k)$ is the complete $k$-partite graph on $n$ vertices with part sizes as equal as possible. Its edge count is given by the formula below — this is the identity the moving-vertex argument exploits.
::: {.exhibit .exhibit--equation data-exhibit-name="Turán Edge Count" data-exhibit-type="equation" data-exhibit-caption="Edge count of a complete k-partite graph: total pairs minus same-part pairs."}
:::: exhibit-body
$$\binom{n}{2} - \sum_{i=1}^{k}\binom{m_i}{2}$$
::::
:::
Every pair of vertices is adjacent *except* those within the same part, so the formula counts edges by subtracting same-part pairs from all pairs.
::: {.annotation .annotation--static}
<div class="annotation-header">
<span class="annotation-label">Remark</span>
<span class="annotation-name">Equal parts maximise edges</span>
</div>
<div class="annotation-body">
Intuitively: if two parts differ in size by more than one vertex, moving a vertex from the larger to the smaller part creates more cross-part pairs than it destroys within-part pairs. The moving-vertex argument below makes this precise.
</div>
:::
::: {.annotation .annotation--collapsible}
<div class="annotation-header">
<span class="annotation-label">Note</span>
<span class="annotation-name">Turán graph definition</span>
<button class="annotation-toggle" aria-expanded="false">▸ expand</button>
</div>
<div class="annotation-body">
The *Turán graph* $T(n,k)$ is the unique (up to isomorphism) complete $k$-partite graph on $n$ vertices whose part sizes differ by at most one. By Turán's theorem, $T(n,k)$ is the $K_{k+1}$-free graph on $n$ vertices with the maximum number of edges.
</div>
:::
::: {.exhibit .exhibit--proof data-exhibit-name="Turán Bound" data-exhibit-type="proof" data-exhibit-caption="Moving one vertex from the larger to the smaller part strictly increases the edge count when parts differ by ≥ 2."}
:::: exhibit-body
Without loss of generality suppose $n_1 - n_2 \ge 2$. Form a new complete $k$-partite graph by moving one vertex from part 1 to part 2. Since the new graph is still complete $k$-partite on the same $n$ vertices, it suffices to show it has strictly more edges.
The number of edges in any complete $k$-partite graph $M_{m_1,\ldots,m_k}$ is
$$\binom{n}{2} - \sum_{i=1}^{k}\binom{m_i}{2},$$
since every pair of vertices is adjacent *except* those within the same part. Therefore
$$|E(G')| - |E(G)| = \binom{n_1}{2} + \binom{n_2}{2} - \binom{n_1-1}{2} - \binom{n_2+1}{2}.$$
Using $\binom{m}{2} = \frac{m(m-1)}{2}$, this simplifies to $(n_1 - 1) - n_2 = n_1 - n_2 - 1$. Since $n_1 - n_2 \ge 2$, we get $|E(G')| - |E(G)| \ge 1 > 0$. [□]{.proof-qed}
::::
:::
## Music Notation
Score fragments are embedded inline as responsive SVGs, integrated with the gallery focusable system. Clicking the fragment — or the expand glyph that appears on hover — opens the shared overlay. The SVG inherits the page's text color via `currentColor`, so notation renders correctly in both light and dark modes. The caption below the score is a persistent `<figcaption>`, in keeping with the convention of printed musical editions.
Prose commentary surrounds the fragment just as it would in an analytical text — above to introduce the passage, below to elaborate on what was shown.
## Links and Wikilinks
External links with domain classes: [Wikipedia on the quadratic formula](https://en.wikipedia.org/wiki/Quadratic_formula), an [arXiv preprint](https://arxiv.org/abs/1234.5678), a [DOI link](https://doi.org/10.1000/xyz123), and [jgm/pandoc on GitHub](https://github.com/jgm/pandoc). A generic external: [example.com](https://example.com).
An internal link [to the essay index](/essays/index.html) is left completely unchanged — no extra classes or attributes added.
Wikilinks: [[About This Site]] resolved from `[[About This Site]]`, and [[The Colophon|the colophon]] resolved from `[[The Colophon|the colophon]]`.
## Filter Output
### Abbreviations
`Filters.Typography` matches exact Pandoc `Str` tokens against a table of common Latin abbreviations and wraps them in `<abbr title="…">` elements. Hover over the highlighted abbreviations below to see the tooltip.
Common scholarly shorthand: e.g. the quadratic formula, i.e. the formula $x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}$. See cf. Stewart §3.4. The argument follows from first principles, viz. the moving-vertex technique. NB: the result holds only for $k \ge 2$.
### Smallcaps
`Filters.Smallcaps` detects runs of three or more uppercase letters and wraps them in `<abbr class="smallcaps">`. Technology acronyms detected automatically: HTML, CSS, API, JSON, URL, NASA, MIT. Trailing punctuation is stripped before the check so HTTP, and REST. also work correctly.
Not converted: short tokens like I, OK (two letters), or mixed-case tokens like JavaScript, macOS, or LaTeX.
### Annotations
::: {.annotation .annotation--static}
<div class="annotation-header">
<span class="annotation-label">Remark</span>
<span class="annotation-name">On static annotations</span>
</div>
<div class="annotation-body">
This is a static annotation. It is always visible and has no toggle. The border separates the header from the body.
</div>
:::
::: {.annotation .annotation--collapsible}
<div class="annotation-header">
<span class="annotation-label">Note</span>
<span class="annotation-name">On collapsible annotations</span>
<button class="annotation-toggle" aria-expanded="false">▸ expand</button>
</div>
<div class="annotation-body">
This annotation is collapsed by default. The abbreviations i.e. and e.g. should be wrapped in `<abbr>` tags by `Filters.Typography`. Clicking the button should expand and collapse this body smoothly, with the last line fully visible.
</div>
:::

View File

@ -1,47 +0,0 @@
---
title: "Universities Should Care"
date: 2026-04-28 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
As Students should be more than a mere statistic to the Universities at which they study. I critique Brown University, my undergraduate institution, in this regard. The degradation of students to treatment as if they are a mere statistic is potentially a major reason for the decline in postsecondary education in the modern United States.
tags: # optional; see Tags section
- ai
- tech
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 85 # 0100 integer (%)
importance: 4 # 15 integer (rendered as filled/empty dots ●●●○○)
evidence: 5 # 15 integer (same)
scope: broad # personal | local | average | broad | civilizational
novelty: moderate # conventional | moderate | idiosyncratic | innovative
practicality: high # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
---
---
Planning: List of grievances
COMPUTER SCIENCE
- TA System section.
-
RES LIFE
- Obviously: repeated requests for discussion and process for moving out in Fall '23.
- Unable to control heat
- Lack of bathrooms.
- Lack of kitchens
DINING
- Let's run through some calculations to see the actual cost of every meal averaged across a semester.
- No real late night options.
- Poor optimization of queues / high demand items like grilled cheese.
- Inconsistent pricing for the same items across locations.
SECURITY
- No substantive changes since December 13th.
EFFECTS ON THE CULTURE

View File

@ -13,7 +13,6 @@ importance: 1
scope: personal
novelty: conventional
practicality: moderate
confidence-history:
---
A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/).

View File

@ -18,7 +18,6 @@ evidence: 4
scope: broad
novelty: innovative
practicality: high
confidence-history:
---
A fuller write-up follows with the clinical-implications manuscript. In the meantime, see the [projects index](/cv/projects/).

View File

@ -1,23 +1,20 @@
---
title: "Speculative Reluctance"
date: 2026-04-15 # required; used for ordering, feed, and display
abstract: > # optional; shown in the metadata block and link previews
date: 2026-04-15
abstract: >
AI labs are likely deliberately reluctant to scale because they are aware that any imminient shift to locally run models as the norm would render their compute redundant. We take Anthropic as a principal case study to validate this hypothesis.
tags: # optional; see Tags section
tags:
- ai
- tech
- speculative
- open
# Epistemic profile — all optional; the entire section is hidden unless `status` is set
status: "Draft" # Draft | Working model | Durable | Refined | Superseded | Deprecated
confidence: 55 # 0100 integer (%)
importance: 3 # 15 integer (rendered as filled/empty dots ●●●○○)
evidence: 1 # 15 integer (same)
scope: broad # personal | local | average | broad | civilizational
novelty: moderate # conventional | moderate | idiosyncratic | innovative
practicality: high # abstract | low | moderate | high | exceptional
confidence-history: # list of integers; trend arrow derived from last two entries
status: "Draft"
confidence: 55
importance: 3
evidence: 1
scope: broad
novelty: moderate
practicality: high
---
Running a lab that develops frontier LLMs is somewhat like playing a game that, by all measurable metrics external, you are bound to lose. The amount of compute required to train a frontier LLM is unbelievably expensive. The expense of inference is even more astronomical. OpenAI claims at the time of this writing to have somewhere between 900 Million and 1 Billion active users, all of whom require some amount of inference cost, and some small subset of whom consume an enormous amount of compute - to use their words, this is ["commercial scale."](https://openai.com/index/accelerating-the-next-phase-ai/). This isn't to mention the immense amount of competition - there are many major players in the United States alone contributing models that push the boundaries. OpenAI may have been the first, but Anthropic, Google, Meta, xAI, and, yes, even Amazon and Bytedance are following right along.

View File

@ -19,7 +19,6 @@ evidence: 5
scope: civilizational
novelty: innovative
practicality: moderate
confidence-history:
---
There are at least two distinct ways to reduce the search space over which AGI^[The definition of "Artificial General Intelligence", or whether such a definition exists, is contentious. My use of the term is not intended to endorse any proposed timeline for AGI, nor to suggest that it is inevitable. It is rather to provide calibration through a hypothetical goal that clearly justifies pursuit.] will have to operate. The first involves a harmonious interaction of agent and human, not transactional in origin, not fully autonomous nor fully human-driven, but rather collaborative in nature - the agent augments the capacity of the human, just as any other good tool for thought does, by working within the scope of something well specified and ideated upon. This is not to say that the agent cannot have a place in such planning, but rather that the human is ultimately the driver of the actions and tasks, defining the scope of what is to be done in as much detail as possible without being the one to actually do it.

View File

@ -14,7 +14,6 @@ importance: 1
scope: local
novelty: moderate
practicality: low
confidence-history:
---
A fuller write-up follows. In the meantime, see the [projects index](/cv/projects/).

View File

@ -19,7 +19,7 @@ authors:
affiliation:
- "Department of Computer Science, Brown University | https://cs.brown.edu"
bibliography: data/simd-paper.bib
repository: "https://git.levineuwirth.org/where-simd-helps"
repository: "https://git.levineuwirth.org/neuwirth/where-simd-helps"
---
## Introduction

View File

@ -42,8 +42,20 @@ add_header Permissions-Policy
# report stream has been clean for a week.
#
# External origins justified inline:
# cdn.jsdelivr.net KaTeX CSS + JS, Vega / Vega-Lite / Vega-Embed
# cdn.jsdelivr.net KaTeX CSS + JS + webfonts (the KaTeX CSS
# references its fonts relatively, so they
# resolve to the CDN -> font-src), Vega /
# Vega-Lite / Vega-Embed, transformers.js
# (whose onnxruntime fetches its .wasm from
# the CDN via fetch() -> connect-src)
# *.basemaps.cartocdn.com Leaflet basemap tiles (photography map only)
# connect-src API hosts link-popup providers fetched directly via
# CORS (the list popups.js documents in its
# header, plus git.levineuwirth.org for the
# Forgejo provider). The CORS-broken trio
# (arxiv, archive.org, pubmed) goes through
# the same-origin /proxy/ instead — see
# nginx/popup-proxy.conf.
#
# Why 'unsafe-inline' on style:
# - photography.html emits <span style="background:$swatch$"> for
@ -53,18 +65,14 @@ add_header Permissions-Policy
# Why 'unsafe-eval' on script:
# - vega-embed compiles Vega-Lite specs at runtime via new Function().
# Removing this would require pre-compiling specs at build time.
# - it also covers WebAssembly.instantiate for onnxruntime-web
# (semantic search).
#
# The value MUST stay on one physical line: nginx has no line
# continuation inside quoted strings — a trailing backslash would embed
# literal backslash + LF bytes in the header value, which is illegal in
# HTTP/2 and gets whole responses rejected by strict clients.
#
# To collect violation reports, set up a `report-uri` endpoint and add
# `report-uri /csp-report;` (and/or `report-to <group>;`) below.
add_header Content-Security-Policy-Report-Only
"default-src 'self'; \
script-src 'self' 'unsafe-eval' https://cdn.jsdelivr.net; \
style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; \
img-src 'self' data: https://*.basemaps.cartocdn.com; \
font-src 'self' data:; \
connect-src 'self'; \
frame-ancestors 'none'; \
base-uri 'self'; \
form-action 'self'; \
object-src 'none'; \
upgrade-insecure-requests" always;
add_header Content-Security-Policy-Report-Only "default-src 'self'; script-src 'self' 'unsafe-eval' https://cdn.jsdelivr.net; style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; img-src 'self' data: https://*.basemaps.cartocdn.com; font-src 'self' data: https://cdn.jsdelivr.net; connect-src 'self' https://cdn.jsdelivr.net https://*.wikipedia.org https://api.crossref.org https://api.github.com https://openlibrary.org https://api.biorxiv.org https://www.youtube.com https://git.levineuwirth.org; frame-ancestors 'none'; base-uri 'self'; form-action 'self'; object-src 'none'; upgrade-insecure-requests" always;

View File

@ -7,7 +7,6 @@ dependencies = [
# Visualization
"matplotlib>=3.9,<4",
"altair>=5.4,<6",
# Embedding pipeline
# Upper bounds are intentionally generous (next major) but always
# present so that an unrelated `uv sync` upgrade can't silently pull
@ -18,7 +17,6 @@ dependencies = [
"beautifulsoup4>=4.12,<5",
# CPU-only torch — avoids pulling ~3 GB of CUDA libraries
"torch>=2.5,<3",
# Photography pipeline
# Pillow handles EXIF reading when exiftool is not installed (the
# preferred path); colorthief computes the 5-color palette strip.
@ -26,6 +24,10 @@ dependencies = [
"pillow>=10.0,<12",
"colorthief>=0.2,<1",
"pyyaml>=6.0,<7",
# Not imported by this repo: required at runtime by nomic-embed's
# remote modeling code (nomic-bert-2048, loaded by embed.py's page
# pass under trust_remote_code with a pinned code_revision).
"einops>=0.8.2,<1",
]
[[tool.uv.index]]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -70,34 +70,35 @@ nav.site-nav {
}
/* Home logo square button flush into the top-left corner of the nav bar.
The L silhouette is rendered via ::before mask-image so the background
matches --bg-nav exactly and the foreground follows --nav-logo-fg (set
per theme in base.css override there to restyle for light mode). */
The rooted-L mark lives in /logo-sprite.svg and is referenced with
<use> (cacheable once, not ~33 KB inlined per page). Its two-tone
cutout still renders because CSS custom properties cascade into the
use-element shadow tree: the letter is drawn in --logo-ink and the
root filament is punched through in --logo-bg. Mapping --logo-bg to
--bg-nav (the button's own surface) makes the roots read as the nav
background showing through. Both tokens are theme-driven in
base.css override --nav-logo-fg / --bg-nav there to restyle per
theme. */
.nav-logo {
position: absolute;
left: 0;
top: 0;
bottom: 0;
aspect-ratio: 1 / 1;
display: block;
display: flex;
align-items: center;
justify-content: center;
overflow: hidden;
flex-shrink: 0;
text-decoration: none;
background-color: var(--bg-nav);
--logo-ink: var(--nav-logo-fg);
--logo-bg: var(--bg-nav);
}
.nav-logo::before {
content: '';
position: absolute;
inset: 12%;
background-color: var(--nav-logo-fg);
mask-image: url('/images/link-icons/internal.svg');
mask-size: contain;
mask-repeat: no-repeat;
mask-position: center;
-webkit-mask-image: url('/images/link-icons/internal.svg');
-webkit-mask-size: contain;
-webkit-mask-repeat: no-repeat;
-webkit-mask-position: center;
.nav-logo__mark {
width: 76%;
height: 76%;
display: block;
}
/* Controls cluster: portals toggle + theme toggle, pinned right */

View File

@ -16,8 +16,10 @@
For an inline <span> inside a <p>, this is roughly the line containing
the sidenote reference, giving correct vertical alignment without JS.
On narrow viewports the <span> is hidden and the Pandoc-generated
<section class="footnotes"> at document end is shown instead.
On narrow viewports the <span> is hidden and the
<section class="footnotes"> the Sidenotes filter appends at document
end is shown instead (Pandoc's own footnote section never exists
the filter consumes every Note, and re-emits this fallback itself).
*/
/* ============================================================
@ -137,22 +139,54 @@
/* ============================================================
FOOTNOTE REFERENCES shown on narrow viewports alongside
section.footnotes
FOOTNOTES FALLBACK LIST the section the Sidenotes filter
appends at document end; visible on narrow viewports only
(see the media queries above). Letter labels are rendered
explicitly because an <ol>'s automatic numbers would disagree
with the in-text letter refs.
============================================================ */
a.footnote-ref {
text-decoration: none;
color: var(--text-faint);
font-size: 0.75em;
line-height: 0;
section.footnotes .footnotes-list {
list-style: none;
margin: 0;
padding: 0;
}
.footnote-item {
position: relative;
top: -0.4em;
padding-left: 1.5rem;
margin-bottom: 0.85rem;
font-size: 0.85rem;
line-height: 1.6;
color: var(--text-muted);
}
.footnote-label {
position: absolute;
left: 0;
top: 0.15em;
font-family: var(--font-sans);
font-size: 0.75em;
color: var(--text-faint);
}
/* First paragraph flows on the label's line; later ones stack. */
.footnote-item > p {
margin: 0 0 0.5em;
}
.footnote-item > p:first-of-type {
display: inline;
}
.footnote-back {
margin-left: 0.35em;
text-decoration: none;
font-family: var(--font-sans);
color: var(--text-faint);
transition: color var(--transition-fast);
}
a.footnote-ref:hover {
.footnote-back:hover {
color: var(--text-muted);
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.0 MiB

After

Width:  |  Height:  |  Size: 1.8 MiB

View File

@ -12,6 +12,8 @@
var STORAGE_KEY = 'site-annotations';
var tooltip = null;
var tooltipTimer = null;
var tooltipPinned = false; /* keyboard-opened: blur must not dismiss */
var tooltipMark = null; /* mark that opened the tooltip, for focus return */
/* ------------------------------------------------------------------
Storage
@ -148,6 +150,18 @@
tooltip.addEventListener('mouseenter', function () { clearTimeout(tooltipTimer); });
tooltip.addEventListener('mouseleave', function () { hideTooltip(false); });
/* Keyboard flow: Escape closes a pinned tooltip and returns focus
to its mark; tabbing out of the tooltip dismisses it. */
tooltip.addEventListener('keydown', function (e) {
if (e.key === 'Escape') {
hideTooltip(true);
if (tooltipMark) tooltipMark.focus();
}
});
tooltip.addEventListener('focusout', function (e) {
if (!tooltip.contains(e.relatedTarget)) hideTooltip(false);
});
}
/* Defer to the shared utility (loaded synchronously from
@ -159,6 +173,8 @@
function showTooltip(mark, ann) {
clearTimeout(tooltipTimer);
tooltipPinned = false;
tooltipMark = mark;
var note = ann.note || '';
var created = ann.created ? new Date(ann.created).toLocaleDateString() : '';
@ -197,6 +213,7 @@
function hideTooltip(immediate) {
clearTimeout(tooltipTimer);
tooltipPinned = false;
if (immediate) {
if (tooltip) tooltip.classList.remove('is-visible');
} else {
@ -212,6 +229,28 @@
showTooltip(mark, ann);
});
mark.addEventListener('mouseleave', function () { hideTooltip(false); });
/* Keyboard: focus mirrors hover; Enter/Space pins the tooltip and
moves focus to its Delete button; Escape dismisses. */
mark.setAttribute('tabindex', '0');
mark.addEventListener('focus', function () {
clearTimeout(tooltipTimer);
showTooltip(mark, ann);
});
mark.addEventListener('blur', function () {
if (!tooltipPinned) hideTooltip(false);
});
mark.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
showTooltip(mark, ann);
tooltipPinned = true;
var del = tooltip.querySelector('.ann-tooltip-delete');
if (del) del.focus();
} else if (e.key === 'Escape') {
hideTooltip(true);
}
});
}
/* ------------------------------------------------------------------

View File

@ -1,86 +0,0 @@
/* citations.js hover tooltip for inline citation markers.
On hover of a .cite-marker, reads the matching bibliography entry from
the DOM and shows it in a floating tooltip. On click, follows the href
to jump to the bibliography section. Phase 3 popups.js can supersede this. */
(function () {
'use strict';
let activeTooltip = null;
let hideTimer = null;
function makeTooltip(html) {
const el = document.createElement('div');
el.className = 'cite-tooltip';
el.innerHTML = html;
el.addEventListener('mouseenter', () => clearTimeout(hideTimer));
el.addEventListener('mouseleave', scheduleHide);
return el;
}
function positionTooltip(tooltip, anchor) {
document.body.appendChild(tooltip);
const aRect = anchor.getBoundingClientRect();
const tRect = tooltip.getBoundingClientRect();
let left = aRect.left + window.scrollX;
let top = aRect.top + window.scrollY - tRect.height - 10;
// Keep horizontally within viewport with margin
const maxLeft = window.innerWidth - tRect.width - 12;
left = Math.max(8, Math.min(left, maxLeft));
// Flip below anchor if not enough room above
if (top < window.scrollY + 8) {
top = aRect.bottom + window.scrollY + 10;
}
tooltip.style.left = left + 'px';
tooltip.style.top = top + 'px';
}
function scheduleHide() {
hideTimer = setTimeout(() => {
if (activeTooltip) {
activeTooltip.remove();
activeTooltip = null;
}
}, 180);
}
function getRefHtml(refEl) {
// Strip the [N] number span, return the remaining innerHTML
const clone = refEl.cloneNode(true);
const num = clone.querySelector('.ref-num');
if (num) num.remove();
return clone.innerHTML.trim();
}
function init() {
document.querySelectorAll('.cite-marker').forEach(marker => {
const link = marker.querySelector('a.cite-link');
if (!link) return;
const href = link.getAttribute('href');
if (!href || !href.startsWith('#')) return;
const refEl = document.getElementById(href.slice(1));
if (!refEl) return;
marker.addEventListener('mouseenter', () => {
clearTimeout(hideTimer);
if (activeTooltip) { activeTooltip.remove(); }
activeTooltip = makeTooltip(getRefHtml(refEl));
positionTooltip(activeTooltip, marker);
});
marker.addEventListener('mouseleave', scheduleHide);
});
}
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', init);
} else {
init();
}
})();

View File

@ -9,9 +9,18 @@
(function () {
'use strict';
var PREFIX = 'section-collapsed:';
/* Keys are namespaced by pathname: Pandoc auto-slugs (#introduction,
#background) recur across essays, and an un-namespaced key would
collapse the same-named section on every page. */
var PREFIX = 'section-collapsed:' + location.pathname + ':';
var store = window.lnUtils && window.lnUtils.safeStorage;
function initHeading(heading) {
// Idempotence guard: reinitCollapse may be called more than once on
// the same container — never re-wrap a section or stack toggle
// buttons (matches the popups.js/sidenotes.js convention).
if (heading.dataset.collapseBound === '1') return;
var level = parseInt(heading.tagName[1], 10);
var content = [];
var node = heading.nextElementSibling;
@ -24,6 +33,7 @@
node = node.nextElementSibling;
}
if (!content.length) return;
heading.dataset.collapseBound = '1';
// Wrap collected nodes in a .section-body div.
var wrapper = document.createElement('div');
@ -41,7 +51,7 @@
// Restore persisted state without transition flash.
var key = PREFIX + heading.id;
var collapsed = localStorage.getItem(key) === '1';
var collapsed = store ? store.get(key) === '1' : false;
function setCollapsed(c, animate) {
if (!animate) wrapper.style.transition = 'none';
@ -80,7 +90,7 @@
void wrapper.offsetHeight; // force reflow
}
setCollapsed(!isCollapsed, true);
localStorage.setItem(key, isCollapsed ? '0' : '1');
if (store) store.set(key, isCollapsed ? '0' : '1');
});
// After open animation: release the height cap so late-rendering

View File

@ -17,9 +17,18 @@
btn.setAttribute('aria-label', 'Copy code to clipboard');
btn.addEventListener('click', function () {
var text = pre.querySelector('code')
? pre.querySelector('code').innerText
: pre.innerText;
var code = pre.querySelector('code');
var text;
if (code) {
text = code.innerText;
} else {
/* Code-less <pre>: clone and strip the injected button so
its label is not copied along with the content. */
var clone = pre.cloneNode(true);
var cloneBtn = clone.querySelector('.copy-btn');
if (cloneBtn) cloneBtn.remove();
text = clone.innerText;
}
navigator.clipboard.writeText(text).then(function () {
btn.textContent = 'copied';

View File

@ -88,6 +88,21 @@
return exhibit.dataset.exhibitCaption || '';
}
/* Make an exhibit wrapper keyboard-operable: role=button, tabindex,
and Enter/Space sharing the click path. closeOverlay()'s focus
return relies on the wrapper being focusable. */
function bindActivation(el, activate) {
el.setAttribute('role', 'button');
el.setAttribute('tabindex', '0');
el.addEventListener('click', activate);
el.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
activate();
}
});
}
function discoverFocusableMath(markdownBody) {
markdownBody.querySelectorAll('.katex-display').forEach(function (katexEl) {
var source = getSource(katexEl);
@ -118,8 +133,8 @@
};
focusables.push(entry);
/* Click anywhere on the wrapper opens the overlay */
wrapper.addEventListener('click', function () {
/* Click or Enter/Space anywhere on the wrapper opens the overlay */
bindActivation(wrapper, function () {
openOverlay(focusables.indexOf(entry));
});
});
@ -151,7 +166,7 @@
};
focusables.push(entry);
figEl.addEventListener('click', function () {
bindActivation(figEl, function () {
openOverlay(focusables.indexOf(entry));
});
});

View File

@ -165,7 +165,12 @@
var images = document.querySelectorAll('img[data-lightbox]');
images.forEach(function (el) {
el.addEventListener('click', function () {
// Keyboard activation: the trigger acts as a button, and the
// tabindex also lets close() return focus to it.
el.setAttribute('tabindex', '0');
el.setAttribute('role', 'button');
function activate() {
// Look for a sibling figcaption in the parent figure
var figcaptionText = '';
var parent = el.parentElement;
@ -176,6 +181,14 @@
}
}
open(el.src, el.alt, figcaptionText, el);
}
el.addEventListener('click', activate);
el.addEventListener('keydown', function (e) {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault();
activate();
}
});
});
@ -199,11 +212,42 @@
setInfoVisible(!overlay.classList.contains('is-info-visible'));
});
// Escape closes; "i" toggles info panel (darkroom only).
/* Focus trap for the overlay: cycle Tab/Shift+Tab through the
focusable controls inside the lightbox so keyboard users
cannot tab out into the obscured page background. Same
approach as gallery.js's trapTab; the [hidden] exclusion
covers infoBtn, which is hidden outside darkroom mode. */
function trapTab(e) {
var focusable = Array.from(overlay.querySelectorAll(
'button:not([disabled]):not([hidden]), [tabindex]:not([tabindex="-1"])'
));
if (focusable.length === 0) {
e.preventDefault();
return;
}
var first = focusable[0];
var last = focusable[focusable.length - 1];
var active = document.activeElement;
if (e.shiftKey) {
if (active === first || !overlay.contains(active)) {
e.preventDefault();
last.focus();
}
} else {
if (active === last || !overlay.contains(active)) {
e.preventDefault();
first.focus();
}
}
}
// Escape closes; Tab is trapped; "i" toggles info panel (darkroom only).
document.addEventListener('keydown', function (e) {
if (!overlay.classList.contains('is-open')) return;
if (e.key === 'Escape') {
close();
} else if (e.key === 'Tab') {
trapTab(e);
} else if ((e.key === 'i' || e.key === 'I')
&& overlay.classList.contains('darkroom')
&& !infoBtn.hidden) {

View File

@ -17,17 +17,23 @@
const toggle = document.querySelector('.nav-portal-toggle');
if (!portals || !toggle) return;
// safeStorage (utils.js, loaded synchronously before us) so a
// storage-blocked context can't throw before the click listener
// below binds; guarded like theme.js in case utils.js itself
// failed to load.
const store = window.lnUtils && window.lnUtils.safeStorage;
function setOpen(open) {
portals.classList.toggle('is-open', open);
toggle.setAttribute('aria-expanded', String(open));
// Rotate arrow indicator if present.
const arrow = toggle.querySelector('.nav-portal-arrow');
if (arrow) arrow.textContent = open ? '▲' : '▼';
localStorage.setItem(STORAGE_KEY, open ? '1' : '0');
if (store) store.set(STORAGE_KEY, open ? '1' : '0');
}
// Restore persisted state; default is collapsed.
const stored = localStorage.getItem(STORAGE_KEY);
const stored = store ? store.get(STORAGE_KEY) : null;
setOpen(stored === '1');
toggle.addEventListener('click', function () {

View File

@ -472,7 +472,12 @@
if (!match) return Promise.resolve(null);
var ctx = { match: match, href: href };
var url = p.url(ctx);
/* p.url runs synchronously (before the .catch below attaches) and
can throw e.g. decodeURIComponent on a malformed percent
sequence in the link path. Treat a throw as "no popup". */
var url;
try { url = p.url(ctx); }
catch (e) { return Promise.resolve(null); }
var fetcher = p.fetchType === 'xml' ? fetchXml : fetchJson;
return fetcher(url, p.fetchInit).then(function (data) {
@ -951,10 +956,10 @@
var agoDays = daysBetween(start, today);
/* "~" prefix when we've rounded to a unit larger than days. */
var span = humanDuration(spanDays, true);
var ago = humanAgo(agoDays);
var ago = humanAgo(agoDays); /* '' when start is in the future */
lines.push(
'<div class="popup-date-primary">'
+ esc(span) + ' · started ' + esc(ago)
+ esc(span) + (ago ? ' · started ' + esc(ago) : '')
+ '</div>');
if (commits && /^\d+$/.test(commits)) {
var n = parseInt(commits, 10);
@ -965,10 +970,16 @@
}
} else {
var days = daysBetween(start, today);
var ago2 = humanAgo(days); /* '' when the date is in the future */
if (ago2) {
lines.push(
'<div class="popup-date-primary">'
+ esc(humanAgo(days)) + '</div>');
+ esc(ago2) + '</div>');
}
}
/* Nothing renderable (e.g. a lone future date): no popup. */
if (!lines.length) return Promise.resolve(null);
return Promise.resolve('<div class="popup-date">' + lines.join('') + '</div>');
}
@ -981,9 +992,10 @@
return isNaN(d.getTime()) ? null : d;
}
/* Whole-day difference between two Dates, floored (never negative). */
/* Whole-day difference b a, floored. Negative when b precedes a,
so callers can detect future dates instead of mislabelling them. */
function daysBetween(a, b) {
var ms = Math.abs(b.getTime() - a.getTime());
var ms = b.getTime() - a.getTime();
return Math.floor(ms / 86400000);
}
@ -1005,9 +1017,12 @@
return (approx ? '~' : '') + y + ' year' + (y === 1 ? '' : 's');
}
/* Past-tense phrasing for a date N days in the past. */
/* Past-tense phrasing for a date N days in the past. Returns '' for
future dates (negative N) mirror now.js so callers render
nothing rather than a false "N days ago". */
function humanAgo(days) {
if (days <= 0) return 'today';
if (days < 0) return ''; /* future / clock skew */
if (days === 0) return 'today';
if (days === 1) return 'yesterday';
if (days < 14) return days + ' days ago';
return humanDuration(days, true) + ' ago';

View File

@ -23,6 +23,9 @@
/* Read ?p= from the query string for deep linking. */
var qs = new URLSearchParams(window.location.search);
/* Keep the canonical URL clean on plain loads: only sync ?p= back to
the URL when one was already present or the user navigates. */
var syncUrl = qs.has('p');
var initPage = parseInt(qs.get('p'), 10);
if (!isNaN(initPage) && initPage >= 1 && initPage <= pageCount) {
currentPage = initPage;
@ -47,7 +50,7 @@
/* Replace URL so the page is bookmarkable at the current position.
The back button still returns to the landing page. */
history.replaceState(null, '', '?p=' + currentPage);
if (syncUrl) history.replaceState(null, '', '?p=' + currentPage);
/* Preload the adjacent pages for smooth turning. */
if (currentPage > 1) new Image().src = pages[currentPage - 2];
@ -132,4 +135,5 @@
------------------------------------------------------------------ */
navigate(currentPage);
syncUrl = true; /* any later navigate() is a user action — sync from here on */
}());

View File

@ -113,12 +113,17 @@
/* ---- URL extraction ---- */
/* Normalise a URL to a pathname for lookup in epistemicMeta.
Pagefind results use full URLs; semantic results use relative paths. */
Pagefind results use full URLs; semantic results use relative paths.
epistemicMeta keys are emitted as routed paths (".../index.html"),
while result links use the clean directory form (".../"), so the
trailing-slash form must be expanded before lookup. */
function normUrl(href) {
if (!href) return null;
try {
var u = new URL(href, window.location.origin);
return u.pathname;
var p = u.pathname;
if (p.charAt(p.length - 1) === '/') p += 'index.html';
return p;
} catch (e) {
return href;
}
@ -268,7 +273,12 @@
if (!el) return;
el.addEventListener('input', function () {
var v = el.value.trim();
state[field] = v !== '' ? Math.max(0, Math.min(100, parseInt(v, 10) || 0)) : null;
var n = parseInt(v, 10);
/* Non-numeric input deactivates the filter (null) rather
than coercing to an always-matching >= 0 threshold. */
state[field] = (v !== '' && !isNaN(n))
? Math.max(0, Math.min(100, n))
: null;
loadMeta().then(applyFilters);
});
});

View File

@ -7,11 +7,18 @@
'use strict';
window.addEventListener('DOMContentLoaded', function () {
var ui = new PagefindUI({
/* If the Pagefind bundle failed to load (e.g. 404), skip only the
Pagefind setup the rest of this handler must still run. */
var ui = null;
if (typeof PagefindUI === 'undefined') {
console.warn('search.js: PagefindUI not loaded — keyword search disabled.');
} else {
ui = new PagefindUI({
element: '#search',
showImages: false,
excerptLength: 30,
});
}
/* Timing instrumentation ------------------------------------------ */
var timingEl = document.getElementById('search-timing');
@ -46,7 +53,7 @@
/* Pre-fill from URL parameter and trigger the search -------------- */
var params = new URLSearchParams(window.location.search);
var q = params.get('q');
if (q) {
if (q && ui) {
startTime = performance.now();
ui.triggerSearch(q);
}

View File

@ -88,6 +88,15 @@
}
function onKeyUp(e) {
/* Typing capitals in the annotation picker's note input (or any
other editable field) releases Shift don't re-summon the
toolbar over the UI the user is typing into. */
var t = e.target;
if (t && t.nodeType === Node.ELEMENT_NODE) {
if (popup.contains(t)) return;
if (picker && picker.contains(t)) return;
if (t.isContentEditable || t.closest('input, textarea')) return;
}
if (e.shiftKey || e.key === 'End' || e.key === 'Home') {
clearTimeout(showTimer);
showTimer = setTimeout(tryShow, SHOW_DELAY);

View File

@ -39,10 +39,17 @@
Index loading fetch once, lazily
------------------------------------------------------------------ */
/* In-flight promise so concurrent first searches share a single
index fetch (mirrors loadModelPromise below). Without this guard,
two rapid keystrokes would each fetch semantic-index.bin and
semantic-meta.json before the first resolves. */
var loadIndexPromise = null;
function loadIndex() {
if (indexReady) return Promise.resolve();
if (loadIndexPromise) return loadIndexPromise;
return Promise.all([
loadIndexPromise = Promise.all([
fetch('/data/semantic-index.bin').then(function (r) {
if (!r.ok) throw new Error('semantic-index.bin not found');
return r.arrayBuffer();
@ -54,8 +61,23 @@
]).then(function (results) {
vectors = new Float32Array(results[0]);
meta = results[1];
/* Consistency check: a stale CDN-cached bin/json pair would
otherwise produce NaN scores and silently garbage ranking. */
if (vectors.length !== meta.length * DIM) {
console.error('semantic-search: index/meta size mismatch ('
+ vectors.length + ' floats vs ' + meta.length + ' × ' + DIM + ')');
vectors = null;
meta = null;
throw new Error('semantic index not available: index/meta size mismatch');
}
indexReady = true;
}).catch(function (err) {
/* Allow a retry on the next call instead of caching the
failed promise forever. */
loadIndexPromise = null;
throw err;
});
return loadIndexPromise;
}
/* ------------------------------------------------------------------
@ -114,14 +136,23 @@
});
}
/* Generation token: each runSearch call invalidates all still-in-flight
predecessors, so a stale (earlier) query's results can never render
after a newer query's. */
var searchGeneration = 0;
function runSearch(query) {
var gen = ++searchGeneration;
query = query.trim();
if (!query) { clearResults(); return; }
setStatus('Searching…');
var indexPromise = loadIndex().catch(function (err) {
if (gen === searchGeneration) {
setStatus('Semantic index not available — run make build first.');
}
throw err;
});
var modelPromise = loadModel();
@ -130,12 +161,14 @@
var pipe = results[1];
return pipe(query, { pooling: 'mean', normalize: true });
}).then(function (output) {
if (gen !== searchGeneration) return; /* superseded by a newer query */
var queryVec = output.data; /* Float32Array, length 384 */
var scores = cosineSims(queryVec);
var hits = topK(scores);
renderResults(hits);
setStatus(hits.length ? '' : 'No results found.');
}).catch(function (err) {
if (gen !== searchGeneration) return; /* superseded by a newer query */
if (err.message && err.message.indexOf('not available') === -1) {
setStatus('Search error — see console for details.');
console.error('semantic-search:', err);

View File

@ -108,11 +108,26 @@
}
}
function loadTransclusion(el) {
/* Nested transclusion limits: ancestors carries the chain of srcs
* currently being expanded (cycle guard a self-transcluding page
* must not loop), and MAX_DEPTH caps pathological nesting. */
var MAX_DEPTH = 3;
function loadTransclusion(el, depth, ancestors) {
depth = depth || 0;
ancestors = ancestors || [];
var src = el.dataset.src;
var section = el.dataset.section || null;
if (!src) return;
if (depth >= MAX_DEPTH || ancestors.indexOf(src) !== -1) {
el.classList.add('transclude--error');
el.textContent = '[transclusion omitted (cycle or depth limit): '
+ src + (section ? '#' + section : '') + ']';
return;
}
el.classList.add('transclude--loading');
fetchPage(src)
@ -138,6 +153,14 @@
el.classList.replace('transclude--loading', 'transclude--loaded');
el.appendChild(wrapper);
/* The fetched page may itself contain transclusion
placeholders process them too, extending the
ancestor chain for cycle/depth guarding. */
var chain = ancestors.concat(src);
wrapper.querySelectorAll('div.transclude').forEach(function (nested) {
loadTransclusion(nested, depth + 1, chain);
});
reinitFragment(el);
})
.catch(function (err) {
@ -147,6 +170,8 @@
}
document.addEventListener('DOMContentLoaded', function () {
document.querySelectorAll('div.transclude').forEach(loadTransclusion);
document.querySelectorAll('div.transclude').forEach(function (el) {
loadTransclusion(el);
});
});
}());

View File

@ -94,6 +94,9 @@
function isDark() {
var t = document.documentElement.dataset.theme;
if (t === 'dark') return true;
/* cappuccino is a dark-brown theme (light text on #553a28) charts
need the dark palette or axis labels become unreadable. */
if (t === 'cappuccino') return true;
if (t === 'light') return false;
return window.matchMedia('(prefers-color-scheme: dark)').matches;
}

7
static/logo-sprite.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 32 KiB

BIN
static/og-image.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

View File

@ -1,13 +1,28 @@
{
"name": "levineuwirth.org",
"name": "Levi Neuwirth",
"short_name": "ln",
"description": "Personal site of Levi Neuwirth — essays, research, music, and photography.",
"start_url": "/",
"scope": "/",
"icons": [
{
"src": "/web-app-manifest-192x192.png",
"sizes": "192x192",
"type": "image/png",
"purpose": "any"
},
{
"src": "/web-app-manifest-192x192.png",
"sizes": "192x192",
"type": "image/png",
"purpose": "maskable"
},
{
"src": "/web-app-manifest-512x512.png",
"sizes": "512x512",
"type": "image/png",
"purpose": "any"
},
{
"src": "/web-app-manifest-512x512.png",
"sizes": "512x512",
@ -15,7 +30,7 @@
"purpose": "maskable"
}
],
"theme_color": "#ffffff",
"background_color": "#ffffff",
"theme_color": "#16140f",
"background_color": "#16140f",
"display": "standalone"
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.8 KiB

After

Width:  |  Height:  |  Size: 106 KiB

View File

@ -17,24 +17,27 @@
$body$
$if(backlinks)$
<footer class="page-meta-footer">
$else$
$if(similar-links)$
<footer class="page-meta-footer">
$endif$
$endif$
$if(backlinks)$
<div class="meta-footer-full meta-footer-backlinks" id="backlinks">
<h3>Backlinks</h3>
$backlinks$
</div>
$endif$
$if(similar-links)$
<div class="meta-footer-full meta-footer-similar" id="similar-links">
<h3>Related</h3>
$similar-links$
</div>
$endif$
$if(backlinks)$
</footer>
$else$
$if(similar-links)$
<footer class="page-meta-footer">
<div class="meta-footer-full meta-footer-similar" id="similar-links">
<h3>Related</h3>
$similar-links$
</div>
</footer>
$endif$
$endif$

View File

@ -14,8 +14,10 @@ $if(home)$<meta property="og:title" content="Levi Neuwirth">$else$$if(title)$<me
$if(description)$<meta property="og:description" content="$description$">$endif$
<meta property="og:url" content="$site-url$$url$">
$if(date)$<meta property="og:type" content="article">$else$<meta property="og:type" content="website">$endif$
<meta property="og:image" content="$site-url$/web-app-manifest-512x512.png">
<meta name="twitter:card" content="summary">
<meta property="og:image" content="$site-url$/og-image.png">
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<meta name="twitter:card" content="summary_large_image">
$if(description)$<meta name="twitter:description" content="$description$">$endif$
<link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">

View File

@ -2,7 +2,13 @@
<nav class="site-nav">
<!-- Row 1: primary links -->
<div class="nav-row-primary">
<a href="/" class="nav-logo" aria-label="Home"></a>
<!-- The mark lives in /logo-sprite.svg and is referenced via
<use> instead of being inlined: the traced path is ~33 KB,
and a per-page inline copy would dwarf most documents. CSS
custom properties (--logo-ink/--logo-bg) cascade into the
use-element shadow tree, so the two-tone cutout still
renders. -->
<a href="/" class="nav-logo" aria-label="Home"><svg class="nav-logo__mark" aria-hidden="true" focusable="false"><use href="/logo-sprite.svg#logo-mark"/></svg></a>
<div class="nav-primary">
<a href="/">Home</a>
<a href="/current.html">Current</a>

View File

@ -7,6 +7,9 @@
<link rel="stylesheet" href="/css/base.css">
<link rel="stylesheet" href="/css/components.css">
<link rel="stylesheet" href="/css/score-reader.css">
<!-- utils.js must precede theme.js: theme.js reads saved settings via
window.lnUtils.safeStorage and silently restores nothing without it. -->
<script src="/js/utils.js"></script>
<script src="/js/theme.js"></script>
</head>
<body class="score-reader-page">

View File

@ -49,6 +49,10 @@ EOF
bold "── new popup provider ──"
NAME=$(prompt "slug (lowercase, used as class + data-popup-source key, e.g. 'zenodo'):")
[[ -z "$NAME" ]] && { warn "slug required"; exit 1; }
# The slug is interpolated into nginx directives (location /proxy/$NAME/,
# set \$upstream_$NAME) — validate like import-photo.sh does so a space,
# ';', or '{' can't produce a config that fails to load.
[[ "$NAME" =~ ^[a-z0-9-]+$ ]] || { warn "slug must match ^[a-z0-9-]+\$"; exit 1; }
LABEL=$(prompt "display label (e.g. 'Zenodo'):")
[[ -z "$LABEL" ]] && LABEL="$NAME"
@ -107,14 +111,16 @@ fi
# ── proxy prefix + upstream host derivation ──────────────────────────
if [[ "$NEEDS_PROXY" -eq 1 ]]; then
# UPSTREAM_HOST is derived unconditionally: the no-proxy (direct CORS
# fetch) case is exactly when the host must be added to connect-src, so
# the checklist's CSP reminder below needs it populated either way.
UPSTREAM_HOST=$(printf '%s' "$API_URL" | awk -F/ '{print $3}')
if [[ "$NEEDS_PROXY" -eq 1 ]]; then
UPSTREAM_PATH=$(printf '%s' "$API_URL" | awk -F/ 'BEGIN{OFS="/"} {$1=""; $2=""; $3=""; print}' | sed 's|^///||')
PROXY_PATH="/proxy/$NAME/"
PROXY_API_URL="$PROXY_PATH${UPSTREAM_PATH%%\?*}"
[[ "$API_URL" == *"?"* ]] && PROXY_API_URL="$PROXY_API_URL?${API_URL#*\?}"
else
UPSTREAM_HOST=""
PROXY_API_URL="$API_URL"
fi
@ -205,8 +211,9 @@ cat <<EOF
EOF
if [[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]; then
echo " 5. In static/js/popups.js top-comment: add $UPSTREAM_HOST to the"
echo " connect-src CSP list."
echo " 5. Add https://$UPSTREAM_HOST to connect-src in"
echo " nginx/security-headers.conf (direct CORS fetches are blocked"
echo " by CSP otherwise), and mirror it in the popups.js top-comment."
fi
echo

View File

@ -104,6 +104,30 @@ def err(msg: str) -> None:
print(f"[archive] ERROR: {msg}", file=sys.stderr)
def atomic_write_text(path: Path, text: str) -> None:
"""Write to a PID-unique temp then os.replace. PROVENANCE.json and
the generated index/state files are integrity records an interrupt
mid-write must never leave a truncated file that the next run parses
(or mistakes for corruption); fsync makes the rename durable and the
PID suffix keeps concurrent runs from sharing a temp file."""
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
f.write(text)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def atomic_write_json(path: Path, obj) -> None:
atomic_write_text(
path, json.dumps(obj, indent=2, ensure_ascii=False) + "\n")
# ---------------------------------------------------------------------------
# Manifest / removed.yaml
# ---------------------------------------------------------------------------
@ -119,6 +143,15 @@ def load_yaml_list(path: Path) -> list[dict]:
if not isinstance(data, list):
err(f"{path.name}: expected a YAML list, got {type(data).__name__}")
sys.exit(1)
# Validate items too: a stray scalar line (`- https://example.com`
# instead of `- url: ...`) would otherwise surface much later as an
# AttributeError deep inside fetch/wayback/check.
for i, item in enumerate(data):
if not isinstance(item, dict):
err(f"{path.name}: entry {i + 1} is not a mapping "
f"(got {type(item).__name__}: {item!r}); "
f"each entry must be `- url: ...`")
sys.exit(1)
return data
@ -241,7 +274,10 @@ def extract_text_pdf(pdf: Path, txt: Path) -> None:
"""Extract plain text from `pdf` into `txt` via pdftotext. On any
failure an empty file is written so downstream steps still find it."""
try:
subprocess.run(["pdftotext", "-q", str(pdf), str(txt)], check=True)
# `--` ends option parsing so a slug starting with `-` cannot be
# mistaken for a pdftotext option.
subprocess.run(["pdftotext", "-q", "--", str(pdf), str(txt)],
check=True)
except (subprocess.CalledProcessError, FileNotFoundError) as exc:
err(f"{pdf.name}: pdftotext failed ({exc}); writing empty text sidecar")
txt.write_text("", encoding="utf-8")
@ -263,6 +299,51 @@ def find_monolith() -> str | None:
return shutil.which("monolith")
MONOLITH_VERSION_FILE = REPO_ROOT / "tools" / "monolith-version.txt"
# Binaries already verified this run — the pin check hashes the binary
# once, not once per snapshot.
_monolith_verified: set[str] = set()
def _pinned_monolith_sha256() -> str | None:
"""Parse the `sha256 = <hex>` line from tools/monolith-version.txt.
Returns None when the file is missing or unparseable (the caller
warns and continues only a *mismatch* is fatal)."""
try:
text = MONOLITH_VERSION_FILE.read_text(encoding="utf-8")
except OSError:
return None
m = re.search(r"^\s*sha256\s*=\s*([0-9a-fA-F]{64})\s*$",
text, re.MULTILINE)
return m.group(1).lower() if m else None
def verify_monolith(mono: str) -> None:
"""Integrity gate for the snapshot tool itself: the binary that
produces committed artifacts must match the SHA-256 pinned in
tools/monolith-version.txt. A mismatch is an integrity error (print
loudly, exit non-zero, halt `make build`); a missing or unparseable
version file is a warning only."""
if mono in _monolith_verified:
return
pinned = _pinned_monolith_sha256()
if pinned is None:
print(f"[archive] WARNING: {MONOLITH_VERSION_FILE.name} is missing "
f"or has no parseable `sha256 = …` line — monolith binary "
f"integrity NOT verified ({mono})", file=sys.stderr)
_monolith_verified.add(mono)
return
live = sha256_of(Path(mono))
if live != pinned:
err(f"monolith binary {mono} fails SHA-256 verification "
f"(pinned {pinned}, found {live}). The snapshot tool's bytes "
f"do not match tools/monolith-version.txt — re-vendor the "
f"binary or update the pin (see that file's instructions).")
sys.exit(1)
_monolith_verified.add(mono)
def body_noarchive(path: Path) -> bool:
"""True if the snapshot declares <meta name=robots ... noarchive> —
the in-document equivalent of the X-Robots-Tag header."""
@ -327,6 +408,7 @@ def fetch_html(url: str, dest: Path) -> bool:
f"tools/bin/monolith (see tools/monolith-version.txt) or set "
f"$MONOLITH_BIN; HTML snapshot skipped")
return False
verify_monolith(mono)
source = dest.with_suffix(dest.suffix + ".source.part")
tmp = dest.with_suffix(dest.suffix + ".part")
@ -715,10 +797,7 @@ def cmd_fetch() -> int:
"snapshot-quality": quality,
"wayback": None,
}
prov_path.write_text(
json.dumps(prov, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
atomic_write_json(prov_path, prov)
log(f"{slug}: archived [{atype}, {quality}] ({prov['bytes']} bytes)")
# --- contribute to the Hakyll index -------------------------------
@ -730,11 +809,7 @@ def cmd_fetch() -> int:
}
# archive-index.json is always rewritten to mirror the manifest exactly.
INDEX_OUT.parent.mkdir(parents=True, exist_ok=True)
INDEX_OUT.write_text(
json.dumps(index, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
atomic_write_json(INDEX_OUT, index)
log(f"wrote {INDEX_OUT.relative_to(REPO_ROOT)} ({len(index)} entries)")
if skipped:
@ -785,14 +860,18 @@ def cmd_refresh(argv: list[str]) -> int:
try:
prev = json.loads(prov_path.read_text(encoding="utf-8"))
prev_sha = prev.get("sha256")
prev_artifact = slug_dir / prev.get("artifact", "")
prev_art_name = prev.get("artifact") or ""
prev_artifact = slug_dir / prev_art_name
except Exception as exc: # noqa: BLE001
err(f"refresh: cannot parse prior provenance for {slug}: {exc}")
return 2
# The prior snapshot must be committed and clean — otherwise
# `previous-sha256` would point at bytes git can no longer give
# back, breaking the auditable replacement contract.
if not prev_sha or not prev_artifact.exists():
# back, breaking the auditable replacement contract. The empty-
# artifact guard matters: without it prev_artifact would be the
# slug directory itself, which exists() accepts and sha256_of
# then crashes on with IsADirectoryError.
if not prev_sha or not prev_art_name or not prev_artifact.is_file():
err(f"refresh: prior snapshot for {slug} is incomplete; restore "
f"its artifact and provenance before replacing it.")
return 2
@ -850,11 +929,7 @@ def cmd_refresh(argv: list[str]) -> int:
if art_name and (slug_dir / art_name).exists():
if prev_sha:
new_prov["previous-sha256"] = prev_sha
prov_path.write_text(
json.dumps(new_prov, indent=2,
ensure_ascii=False) + "\n",
encoding="utf-8",
)
atomic_write_json(prov_path, new_prov)
log(f"refresh: recorded previous-sha256 "
f"{prev_sha[:12]}")
succeeded = True
@ -893,7 +968,11 @@ def wayback_save(url: str) -> None:
"""Trigger a fresh Wayback capture via Save Page Now. Best-effort: any
outcome is tolerated the resulting URL is read back via the
availability API (which also surfaces a pre-existing capture)."""
req = urllib.request.Request("https://web.archive.org/save/" + url,
# Quote only what can't appear raw in a request line (spaces,
# control chars); URL structure (:/?&=#) passes through so Save
# Page Now sees the original URL shape.
req = urllib.request.Request(
"https://web.archive.org/save/" + quote(url, safe=":/?&=#"),
headers={"User-Agent": USER_AGENT})
try:
with urllib.request.urlopen(req, timeout=WAYBACK_TIMEOUT):
@ -951,10 +1030,7 @@ def cmd_wayback() -> int:
capture = wayback_lookup(url)
if capture:
prov["wayback"] = capture
prov_path.write_text(
json.dumps(prov, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
atomic_write_json(prov_path, prov)
log(f"{slug}: wayback -> {capture}")
backfilled += 1
else:
@ -1073,11 +1149,7 @@ def cmd_check() -> int:
note = f" -> {new_url}" if new_url else ""
log(f"check: {url} [{rec['status']}]{note}")
STATE_OUT.parent.mkdir(parents=True, exist_ok=True)
STATE_OUT.write_text(
json.dumps(state, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
atomic_write_json(STATE_OUT, state)
log(f"check: {tally['live']} live, {tally['moved']} moved, "
f"{tally['error']} error, {tally['rotted']} rotted "
f"-> {STATE_OUT.relative_to(REPO_ROOT)}")

View File

@ -32,7 +32,11 @@ while IFS= read -r -d '' img; do
skipped=$((skipped + 1))
else
echo " webp ${img#"$REPO_ROOT/"}"
cwebp -quiet -q 85 "$img" -o "$webp"
# Write to a temp name then move: an interrupted cwebp would
# otherwise leave a truncated .webp that is newer than its
# source, which the staleness gate above then skips forever.
cwebp -quiet -q 85 "$img" -o "$webp.part"
mv "$webp.part" "$webp"
converted=$((converted + 1))
fi
done < <(find "$REPO_ROOT/static" "$REPO_ROOT/content" \

View File

@ -7,8 +7,9 @@
# the site, no third-party request at view time.
#
# Run once before deploying. The vendored copy is gitignored
# (~150 KB total); re-running is safe — the script skips when the
# files already exist.
# (~150 KB total); re-running is safe — files that already exist AND
# match their pinned checksum are skipped; anything missing or
# mismatched is re-fetched.
#
# To bump the pinned versions, set LEAFLET_VERSION / MARKERCLUSTER_VERSION,
# re-run, then update tools/leaflet-checksums.sha256 with the new hashes.
@ -39,13 +40,6 @@ files_to_fetch=(
"$UNPKG_MC|MarkerCluster.Default.css|leaflet.markercluster-${MARKERCLUSTER_VERSION}-MarkerCluster.Default.css"
)
# Skip the whole step if the canonical entry-point already exists.
# Force a re-fetch by removing the directory.
if [ -f "$LEAFLET_DIR/leaflet.js" ] && [ -f "$LEAFLET_DIR/leaflet.markercluster.js" ]; then
echo "leaflet: already vendored at $LEAFLET_DIR (skipping)"
exit 0
fi
mkdir -p "$LEAFLET_DIR/images"
verify_or_warn() {
@ -71,15 +65,35 @@ verify_or_warn() {
fi
}
# Per-file skip: existing files are skipped only after re-verifying
# their checksum, so a partial or tampered file from an interrupted
# earlier run can never be silently accepted. Downloads land in a
# .part temp and are only moved into place after verification — a
# failed verification leaves nothing at the final path.
for entry in "${files_to_fetch[@]}"; do
IFS='|' read -r url_base local_path pin_key <<<"$entry"
src_name="${local_path##*/}"
target="$LEAFLET_DIR/$local_path"
mkdir -p "$(dirname "$target")"
if [ -f "$target" ]; then
if verify_or_warn "$target" "$pin_key"; then
echo "leaflet: $local_path present and verified (skipping)"
continue
fi
echo "leaflet: $local_path failed verification — re-fetching" >&2
rm -f "$target"
fi
echo "leaflet: fetching $local_path ($pin_key)"
curl -fsSL --progress-bar "$url_base/$src_name" -o "$target"
verify_or_warn "$target" "$pin_key"
tmp="$target.part"
curl -fsSL --progress-bar "$url_base/$src_name" -o "$tmp"
if ! verify_or_warn "$tmp" "$pin_key"; then
rm -f "$tmp"
echo "leaflet: refusing to vendor unverified $local_path" >&2
exit 1
fi
mv "$tmp" "$target"
done
echo "leaflet: vendored to $LEAFLET_DIR"

View File

@ -68,8 +68,13 @@ fetch() {
return
fi
echo " fetch $src"
curl -fsSL --progress-bar "$BASE_URL/$src" -o "$dst"
verify_sha "$src" "$dst"
# Download to a temp name and move into place only after
# verification: an interrupted curl must never leave a partial
# file at the final path, where the present-file skip (or, for an
# unpinned file, nothing at all) would accept it forever.
curl -fsSL --progress-bar "$BASE_URL/$src" -o "$dst.part"
verify_sha "$src" "$dst.part"
mv "$dst.part" "$dst"
}
if [ ! -f "$CHECKSUMS" ]; then

View File

@ -5,20 +5,36 @@ embed.py — Build-time embedding pipeline.
Produces two outputs from _site/**/*.html:
data/similar-links.json Page-level similarity (for "Related" footer section)
data/semantic-index.bin Paragraph vectors as raw Float32 array (N × DIM)
data/semantic-index.bin Paragraph vectors as raw Float32 array (N × PARA_DIM)
data/semantic-meta.json Paragraph metadata: [{url, title, heading, excerpt}]
Both use all-MiniLM-L6-v2 (384 dims) the same model shipped to the browser
via transformers.js for query-time semantic search.
Two models, one process:
* Pages use nomic-embed-text-v1.5 (768 dims) build-time only, never
shipped to the browser. Chosen for its well-separated cosine scores on
small corpora, which keeps the MIN_SCORE gate meaningful so every essay
reliably gets a "Related" footer section.
* Paragraphs use all-MiniLM-L6-v2 (384 dims) must match what the
browser runs via transformers.js (static/js/semantic-search.js) since
query vectors are dotted against the shipped index.
Called by `make build` when .venv exists. Failures are non-fatal.
Staleness check: skips if all output files are newer than every HTML in _site/.
Staleness: both passes are content-hash cached (data/embed-cache-*.npz),
so an unchanged site re-embeds nothing and loads no model only the
HTML extraction pass runs. There is deliberately no mtime-based skip:
stamp-build-time.py rewrites every page's footer after this script runs,
so "are outputs newer than the HTML" is always false and a check based
on it can never fire.
"""
import hashlib
import json
import os
import re
import sys
import zipfile
from pathlib import Path
import faiss
@ -35,13 +51,48 @@ SITE_DIR = REPO_ROOT / "_site"
SIMILAR_OUT = REPO_ROOT / "data" / "similar-links.json"
SEMANTIC_BIN = REPO_ROOT / "data" / "semantic-index.bin"
SEMANTIC_META = REPO_ROOT / "data" / "semantic-meta.json"
# Content-addressed caches, one per pass. Keyed by sha256 of the (prefixed)
# input text; invalidated wholesale on model name/revision/dim change.
# Gitignored — build artifacts, not source. Survive `make clean`.
PAGE_CACHE = REPO_ROOT / "data" / "embed-cache-pages.npz"
PARA_CACHE = REPO_ROOT / "data" / "embed-cache-paragraphs.npz"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
# Pinned to a specific HuggingFace commit so a future model bump can't
# silently change embedding semantics. Bump deliberately when validating
# (and re-run a full embed pass to refresh data/semantic-* + similar-links).
MODEL_REVISION = "c9745ed1d9f207416be6d2e6f8de32d1f16199bf"
DIM = 384
# Two models, deliberately split:
#
# PARA_MODEL — embeds paragraphs for data/semantic-index.bin. This index
# is fetched by the browser at /search/ and ranked against query vectors
# computed client-side. The client (static/js/semantic-search.js) embeds
# queries with MiniLM-L6-v2 via transformers.js, so the build-time model
# must match exactly — both the architecture and the embedding dimension
# are part of the wire contract.
#
# PAGE_MODEL — embeds full pages for data/similar-links.json. This file
# is consumed only at Hakyll-build time (SimilarLinks.hs) and never
# shipped to the browser, so it is free to use a different, stronger
# model. nomic-embed-text-v1.5 produces well-separated cosine scores on
# small corpora (top neighbours at 0.70.9 instead of MiniLM's compressed
# 0.10.3), so the MIN_SCORE gate below is meaningful and every essay
# reliably gets a "Related" footer section.
#
# Both pins are deliberate. Bump only when validating and re-run a full
# embed pass to refresh the corresponding output files.
PARA_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
PARA_MODEL_REVISION = "c9745ed1d9f207416be6d2e6f8de32d1f16199bf"
PARA_DIM = 384
PAGE_MODEL_NAME = "nomic-ai/nomic-embed-text-v1.5"
PAGE_MODEL_REVISION = "e9b6763023c676ca8431644204f50c2b100d9aab"
# The weights repo above declares its modeling code via auto_map in a
# SEPARATE repo (nomic-ai/nomic-bert-2048), which `revision=` does NOT
# pin — without this second pin, trust_remote_code executes whatever is
# at that repo's head at build time.
PAGE_MODEL_CODE_REVISION = "7710840340a098cfb869c4f65e87cf2b1b70caca"
PAGE_DIM = 768
# Nomic requires task-prefixed input. Documents (corpus side) get
# "search_document: "; queries would get "search_query: ". similar-links
# only ever embeds documents, so the prefix is constant here.
PAGE_PREFIX = "search_document: "
TOP_N = 5 # similar-links: neighbours per page
MIN_SCORE = 0.30 # similar-links: discard weak matches
@ -69,33 +120,111 @@ PORTAL_BODY_ATTR = "data-portal"
def atomic_write_bytes(path: Path, data: bytes) -> None:
"""Write to path.tmp then os.replace, so an interrupt mid-write
cannot leave a truncated file that the next build/serve loads."""
"""Write to a PID-unique temp then os.replace: an interrupt mid-write
cannot leave a truncated file at the final path, fsync makes the
rename durable across power loss, and the PID suffix keeps two
concurrent runs from interleaving writes into one temp file."""
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
tmp.write_bytes(data)
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("wb") as f:
f.write(data)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def atomic_write_text(path: Path, text: str) -> None:
atomic_write_bytes(path, text.encode("utf-8"))
# ---------------------------------------------------------------------------
# Page-embedding cache
# ---------------------------------------------------------------------------
#
# Loading the nomic model and embedding 26 pages on CPU takes ~3 minutes
# every `make build`. Pages rarely change between builds — usually one
# essay is edited and everything else is identical. This cache stores
# one nomic vector per page content hash so unchanged pages are reused
# verbatim and only edited/new pages are re-embedded. A fully-warm cache
# skips the model load entirely.
def content_hash(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
def load_vec_cache(path: Path, model: str, revision: str,
dim: int) -> dict[str, np.ndarray]:
"""Load {hash: vector} from disk. Returns an empty dict if the cache
is absent, unreadable, or pinned to a different model in those
cases save_vec_cache() will overwrite the stale file on next save."""
if not path.exists():
return {}
try:
npz = np.load(path, allow_pickle=False)
if (npz["model"].item() != model or
npz["revision"].item() != revision or
int(npz["dim"].item()) != dim):
return {}
hashes = npz["hashes"]
vectors = npz["vectors"]
if vectors.shape != (len(hashes), dim):
return {}
return {h.item(): vectors[i] for i, h in enumerate(hashes)}
except (OSError, KeyError, ValueError, EOFError,
zipfile.BadZipFile) as e:
print(f"embed.py: cache {path.name} unreadable ({e}) — discarding",
file=sys.stderr)
return {}
def save_vec_cache(path: Path, model: str, revision: str, dim: int,
cache: dict[str, np.ndarray]) -> None:
"""Atomically persist {hash: vector}. Empty cache writes an empty
file so a subsequent load returns {} cleanly (instead of falling
through to the "no file" path)."""
if cache:
hashes = np.array(list(cache.keys()))
vectors = np.stack(list(cache.values())).astype(np.float32)
else:
hashes = np.array([], dtype="U64")
vectors = np.zeros((0, dim), dtype=np.float32)
path.parent.mkdir(parents=True, exist_ok=True)
# Pass an open file handle, not a path: np.savez_compressed appends
# ".npz" to bare paths, which would mangle our atomic-rename target.
# PID-unique temp so concurrent runs can't interleave; fsync so the
# rename is durable.
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with open(tmp, "wb") as f:
np.savez_compressed(
f,
model=model,
revision=revision,
dim=dim,
hashes=hashes,
vectors=vectors,
)
f.flush()
os.fsync(f.fileno())
os.replace(tmp, path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
STRIP_SELECTORS = [
"nav", "footer", "#toc", ".link-popup", "script", "style",
".page-meta-footer", ".metadata", "[data-pagefind-ignore]",
# The no-JS footnotes fallback duplicates each sidenote's text
# verbatim at the document end — indexing it would double every
# footnote in search results and skew page similarity.
"section.footnotes",
]
# ---------------------------------------------------------------------------
# Staleness check
# ---------------------------------------------------------------------------
def needs_update() -> bool:
outputs = [SIMILAR_OUT, SEMANTIC_BIN, SEMANTIC_META]
if not all(p.exists() for p in outputs):
return True
oldest = min(p.stat().st_mtime for p in outputs)
return any(html.stat().st_mtime > oldest for html in SITE_DIR.rglob("*.html"))
# ---------------------------------------------------------------------------
# HTML parsing helpers
# ---------------------------------------------------------------------------
@ -191,10 +320,6 @@ def main() -> int:
print("embed.py: _site/ not found — skipping", file=sys.stderr)
return 0
if not needs_update():
print("embed.py: all outputs up to date — skipping")
return 0
# --- Extract pages + paragraphs in one pass ---
print("embed.py: extracting pages…")
pages = []
@ -211,18 +336,44 @@ def main() -> int:
print("embed.py: no indexable pages found", file=sys.stderr)
return 0
# --- Load model once for both tasks ---
print(f"embed.py: loading {MODEL_NAME}@{MODEL_REVISION[:8]}")
model = SentenceTransformer(MODEL_NAME, revision=MODEL_REVISION)
# --- Similar-links (page level, nomic, content-hash cached) ---
cache = load_vec_cache(PAGE_CACHE, PAGE_MODEL_NAME,
PAGE_MODEL_REVISION, PAGE_DIM)
page_inputs = [PAGE_PREFIX + p["text"] for p in pages]
hashes = [content_hash(t) for t in page_inputs]
miss_idxs = [i for i, h in enumerate(hashes) if h not in cache]
# --- Similar-links (page level) ---
print(f"embed.py: embedding {len(pages)} pages…")
page_vecs = model.encode(
[p["text"] for p in pages],
print(f"embed.py: pages: {len(pages) - len(miss_idxs)} cached / "
f"{len(miss_idxs)} to embed")
if miss_idxs:
print(f"embed.py: loading {PAGE_MODEL_NAME}@{PAGE_MODEL_REVISION[:8]}")
page_model = SentenceTransformer(
PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True,
# code_revision pins the auto_map modeling repo; it must reach
# both AutoConfig and AutoModel.from_pretrained.
model_kwargs={"code_revision": PAGE_MODEL_CODE_REVISION},
config_kwargs={"code_revision": PAGE_MODEL_CODE_REVISION},
)
new_vecs = page_model.encode(
[page_inputs[i] for i in miss_idxs],
normalize_embeddings=True,
show_progress_bar=True,
batch_size=64,
batch_size=8,
).astype(np.float32)
for i, vec in zip(miss_idxs, new_vecs):
cache[hashes[i]] = vec
# Drop the model before loading MiniLM below; sentence-transformers
# holds the full weight tensor in RAM until GC runs.
del page_model
# Assemble page_vecs in the original pages[] order.
page_vecs = np.stack([cache[h] for h in hashes]).astype(np.float32)
# Prune the cache to only currently-present hashes so a deleted page
# doesn't keep its vector around forever. Then persist.
save_vec_cache(PAGE_CACHE, PAGE_MODEL_NAME, PAGE_MODEL_REVISION,
PAGE_DIM, {h: cache[h] for h in hashes})
index = faiss.IndexFlatIP(page_vecs.shape[1])
index.add(page_vecs)
@ -245,18 +396,38 @@ def main() -> int:
atomic_write_text(SIMILAR_OUT, json.dumps(similar, ensure_ascii=False, indent=2))
print(f"embed.py: wrote {len(similar)} similar-links entries")
# --- Semantic index (paragraph level) ---
# --- Semantic index (paragraph level, MiniLM, content-hash cached) ---
if not paragraphs:
print("embed.py: no paragraphs extracted — skipping semantic index")
return 0
print(f"embed.py: embedding {len(paragraphs)} paragraphs…")
para_vecs = model.encode(
[p["text"] for p in paragraphs],
pcache = load_vec_cache(PARA_CACHE, PARA_MODEL_NAME,
PARA_MODEL_REVISION, PARA_DIM)
para_inputs = [p["text"] for p in paragraphs]
para_hashes = [content_hash(t) for t in para_inputs]
para_miss = [i for i, h in enumerate(para_hashes) if h not in pcache]
print(f"embed.py: paragraphs: {len(paragraphs) - len(para_miss)} cached / "
f"{len(para_miss)} to embed")
if para_miss:
print(f"embed.py: loading {PARA_MODEL_NAME}@{PARA_MODEL_REVISION[:8]}")
para_model = SentenceTransformer(PARA_MODEL_NAME,
revision=PARA_MODEL_REVISION)
new_para_vecs = para_model.encode(
[para_inputs[i] for i in para_miss],
normalize_embeddings=True,
show_progress_bar=True,
batch_size=64,
).astype(np.float32)
for i, vec in zip(para_miss, new_para_vecs):
pcache[para_hashes[i]] = vec
del para_model
# Assemble in original paragraph order; prune + persist the cache.
para_vecs = np.stack([pcache[h] for h in para_hashes]).astype(np.float32)
save_vec_cache(PARA_CACHE, PARA_MODEL_NAME, PARA_MODEL_REVISION,
PARA_DIM, {h: pcache[h] for h in para_hashes})
atomic_write_bytes(SEMANTIC_BIN, para_vecs.tobytes())

View File

@ -31,6 +31,7 @@ images are logged and the rest of the walk continues.
from __future__ import annotations
import os
import sys
from pathlib import Path
from typing import Any
@ -62,13 +63,20 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp")
# PID-unique temp (concurrent runs can't share it), removed on
# failure. No fsync: sidecars are regenerated from the photo on the
# next build, so a lost rename costs one re-extraction, not data.
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
# Preserve a stable key order (width before height) so a manual
# diff stays easy to read across regenerations.
ordered = {k: data[k] for k in ("width", "height") if k in data}
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _read_dimensions(image: Path) -> dict[str, int]:

View File

@ -36,6 +36,7 @@ images are logged and the rest of the walk continues.
from __future__ import annotations
import json
import os
import shutil
import subprocess
import sys
@ -133,6 +134,12 @@ def _read_exif_via_exiftool(image: Path) -> dict[str, Any]:
entry. Numeric values come through as numbers; text values as
strings. We accept missing keys silently.
"""
# exiftool does not reliably support `--` as an end-of-options
# marker, so make the path argument non-option-shaped instead: a
# relative path is prefixed with ./ so it can never start with `-`.
image_arg = str(image)
if not os.path.isabs(image_arg):
image_arg = os.path.join(os.curdir, image_arg)
result = subprocess.run(
[
"exiftool",
@ -156,7 +163,7 @@ def _read_exif_via_exiftool(image: Path) -> dict[str, Any]:
"-ImageWidth",
"-ImageHeight",
"-n", # numeric output for shutter/aperture/GPS/dimensions
str(image),
image_arg,
],
capture_output=True,
text=True,
@ -374,12 +381,19 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp")
# PID-unique temp (concurrent runs can't share it), removed on
# failure. No fsync: sidecars are regenerated from the photo on the
# next build, so a lost rename costs one re-extraction, not data.
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
# Preserve the SIDECAR_KEYS order so a manual diff is easy to read.
ordered = {k: data[k] for k in SIDECAR_KEYS if k in data}
yaml.safe_dump(ordered, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _read_one(image: Path) -> dict[str, Any]:

View File

@ -23,6 +23,7 @@ a palette extraction error.
from __future__ import annotations
import os
import sys
from pathlib import Path
from typing import Any
@ -62,10 +63,17 @@ def _is_stale(image: Path, sidecar: Path) -> bool:
def _atomic_write_yaml(path: Path, data: dict[str, Any]) -> None:
tmp = path.with_suffix(path.suffix + ".tmp")
# PID-unique temp (concurrent runs can't share it), removed on
# failure. No fsync: sidecars are regenerated from the photo on the
# next build, so a lost rename costs one re-extraction, not data.
tmp = path.with_suffix(path.suffix + f".tmp.{os.getpid()}")
try:
with tmp.open("w", encoding="utf-8") as f:
yaml.safe_dump(data, f, sort_keys=False, allow_unicode=True)
tmp.replace(path)
except BaseException:
tmp.unlink(missing_ok=True)
raise
def _extract_palette(image: Path) -> list[str]:

View File

@ -20,9 +20,11 @@
set -u
# Newly-added .md files under content/essays/ in this commit.
# `--name-status` output is TAB-separated (status<TAB>path); split on the
# tab so paths containing spaces survive intact.
mapfile -t added < <(
git diff --cached --name-status --diff-filter=A -- 'content/essays/*.md' \
| awk '{ print $2 }'
| cut -f2-
)
if [[ ${#added[@]} -eq 0 ]]; then
@ -47,8 +49,10 @@ for path in "${added[@]}"; do
# Best-effort frontmatter probe: does any line in the YAML head
# block start with `status:`? Avoids a YAML dependency in the
# hook, which has to run before the build environment is sourced.
if awk '/^---$/{f++; next} f==1 && /^status:[[:space:]]*[^[:space:]]/{print; exit}' \
-- "$path" \
# Probe the STAGED blob (`git show :path`), not the working tree —
# the commit contains the index content, which may differ.
if git show ":$path" 2>/dev/null \
| awk '/^---$/{f++; next} f==1 && /^status:[[:space:]]*[^[:space:]]/{print; exit}' \
| grep -q .; then
has_status=1
fi

View File

@ -148,7 +148,14 @@ fi
echo "import-photo: stripping EXIF from delivered file..."
magick mogrify -strip "$TARGET" \
|| { echo "import-photo: magick mogrify -strip failed for $TARGET (EXIF NOT stripped)" >&2; exit 1; }
|| {
# The copy under content/ still carries full EXIF (GPS, serial
# numbers); the Makefile's `git add content/` could auto-commit
# and publish it. Remove it before bailing out.
rm -f -- "$TARGET"
echo "import-photo: magick mogrify -strip failed for $TARGET (EXIF NOT stripped); deleted the copied target so the EXIF-laden JPEG cannot be auto-committed" >&2
exit 1
}
# ---------------------------------------------------------------------------
# Step 4: extract palette (does its own walk; idempotent on already-done photos)

View File

@ -28,7 +28,9 @@ echo -n "Signing subkey passphrase: "
read -rs PASSPHRASE
echo
echo -n "$PASSPHRASE" | GNUPGHOME="$GNUPGHOME" "$GPG_PRESET" --homedir "$GNUPGHOME" --preset "$KEYGRIP"
# printf, not `echo -n`: a passphrase starting with -e/-n/-E would be
# eaten as an echo option.
printf '%s' "$PASSPHRASE" | GNUPGHOME="$GNUPGHOME" "$GPG_PRESET" --homedir "$GNUPGHOME" --preset "$KEYGRIP"
echo "Passphrase cached for keygrip $KEYGRIP (24 h TTL)."
echo "Test: GNUPGHOME=$GNUPGHOME gpg --homedir $GNUPGHOME --batch --detach-sign --armor --output /dev/null /dev/null"

View File

@ -8,11 +8,29 @@ FREEZE="$REPO_ROOT/cabal.project.freeze"
cd "$REPO_ROOT"
# Back up the current freeze and restore it if resolution fails, so an
# unsolvable index never leaves the repo with no freeze file at all
# (recoverable via git, but the script shouldn't depend on that).
BACKUP=""
if [ -f "$FREEZE" ]; then
BACKUP="$(mktemp "$FREEZE.bak.XXXXXX")"
cp "$FREEZE" "$BACKUP"
fi
restore_on_failure() {
if [ -n "$BACKUP" ]; then
echo "==> Refreeze failed — restoring previous freeze file." >&2
mv "$BACKUP" "$FREEZE"
fi
}
trap restore_on_failure ERR
echo "==> Removing stale freeze file..."
rm -f "$FREEZE"
echo "==> Resolving dependencies and writing new freeze file..."
cabal freeze
trap - ERR
[ -n "$BACKUP" ] && rm -f "$BACKUP"
echo "==> Verifying build..."
cabal build

View File

@ -49,8 +49,19 @@ def stamp_file(path: str, replacement_bytes: bytes) -> bool:
data,
)
if count and new_data != data:
with open(path, "wb") as f:
# Write to a sibling temp file and os.replace so an interrupt
# mid-write never leaves a truncated deployed HTML file.
tmp = path + ".stamp-tmp"
try:
with open(tmp, "wb") as f:
f.write(new_data)
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except FileNotFoundError:
pass
raise
return True
return False

11
uv.lock
View File

@ -156,6 +156,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" },
]
[[package]]
name = "einops"
version = "0.8.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/2c/77/850bef8d72ffb9219f0b1aac23fbc1bf7d038ee6ea666f331fa273031aa2/einops-0.8.2.tar.gz", hash = "sha256:609da665570e5e265e27283aab09e7f279ade90c4f01bcfca111f3d3e13f2827", size = 56261, upload-time = "2026-01-26T04:13:17.638Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2a/09/f8d8f8f31e4483c10a906437b4ce31bdf3d6d417b73fe33f1a8b59e34228/einops-0.8.2-py3-none-any.whl", hash = "sha256:54058201ac7087911181bfec4af6091bb59380360f069276601256a76af08193", size = 65638, upload-time = "2026-01-26T04:13:18.546Z" },
]
[[package]]
name = "faiss-cpu"
version = "1.13.2"
@ -364,6 +373,7 @@ dependencies = [
{ name = "altair" },
{ name = "beautifulsoup4" },
{ name = "colorthief" },
{ name = "einops" },
{ name = "faiss-cpu" },
{ name = "matplotlib" },
{ name = "numpy" },
@ -379,6 +389,7 @@ requires-dist = [
{ name = "altair", specifier = ">=5.4,<6" },
{ name = "beautifulsoup4", specifier = ">=4.12,<5" },
{ name = "colorthief", specifier = ">=0.2,<1" },
{ name = "einops", specifier = ">=0.8.2,<1" },
{ name = "faiss-cpu", specifier = ">=1.9,<2" },
{ name = "matplotlib", specifier = ">=3.9,<4" },
{ name = "numpy", specifier = ">=2.0,<3" },