Commit Graph

23 Commits

Author SHA1 Message Date
Levi Neuwirth 23250d8782 Fix popup previews: proxy prefix-strip bug, arXiv IDs, Wikipedia images
The root cause of 'PDF/arXiv previews simply do not work' was twofold:

1. nginx/popup-proxy.conf was never installed on the VPS — every
   /proxy/* request (arXiv, PubMed, Internet Archive) returned nginx's
   default 404. Now installed (snippets + http{}-context cache/limit
   zones in conf.d, included in the vhost, nginx -t verified, reloaded).
2. The snippet itself had a latent bug that only surfaced once
   installed: with a VARIABLE upstream, a URI part on proxy_pass is
   passed literally — every request hit the upstream's homepage
   (archive.org HTML where JSON was expected, arXiv 429s, NCBI doc-page
   redirects). Fixed with explicit prefix-strip rewrites; bad cached
   responses purged. All three proxies verified returning real data,
   including a live arXiv title resolve.

Client-side improvements:
- arXiv match covers old-style IDs (cs/9901002, math.GT/0309136,
  cond-mat/...v1) alongside new-style, and .pdf-suffixed /pdf/ URLs
  (regex verified against six forms)
- Wikipedia popups show the article's lead image: pageimages rides
  along the existing extracts call (pithumbsize=320), rendered via a
  new https-only image slot in renderPopup with float styling;
  upload.wikimedia.org added to the CSP's img-src
- pdf-thumbs now walks all of static/ (pdfjs pruned), so /cv.pdf and
  /resume.pdf — the most-linked internal PDFs, previously thumbnail-less
  and therefore popup-less — get hover previews

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:06:13 -04:00
Levi Neuwirth 5d344f940e Last audit stragglers: scaffolder, refreeze safety, atomic-write polish
- add-popup-source.sh: slug validated against ^[a-z0-9-]+$ before nginx
  interpolation; UPSTREAM_HOST derived unconditionally so the CSP
  reminder fires in the no-proxy case — which is exactly when the host
  must be added to connect-src (AUDIT §4.8)
- refreeze.sh: backs up the freeze and restores it on a failed resolve
  instead of leaving the repo with no freeze file (§4.9)
- einops gets the policy-mandated upper bound and a comment naming its
  consumer (nomic's remote modeling code) (§1.5)
- Makefile: pdftoppm failures warn instead of vanishing in the while
  pipeline; .NOTPARALLEL guards deploy's clean->build->sign ordering
  against -j invocations (§8.4)
- Atomic writers (embed, archive, the three sidecar extractors):
  PID-unique temp names so concurrent runs can't interleave, cleanup on
  failure everywhere, fsync where the artifact is not trivially
  regenerable (§4.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:43:14 -04:00
Levi Neuwirth f11495ff9a Fix audit tooling/infra findings
- embed.py: pin nomic's auto_map modeling repo via code_revision —
  revision= alone left nomic-bert-2048 unpinned under
  trust_remote_code (AUDIT §1.3; verified loadable with
  HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
  cache so a half-written npz is discarded, not fatal (§4.2), and
  unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
  continuation in quoted strings, so the old value embedded literal
  backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
  site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
  and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
  unrelated work is no longer swept into auto: commits (§8.3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth fad8719045 Stamp the site-wide build time post-render
Hakyll caches per-page outputs, so pages whose dependencies have not
changed are not recompiled and the rendered $build-time$ in their
footer goes stale relative to a fresh build. The right granularity for
"last built at" is site-wide rather than per-page; wrapping the footer
timestamp in <span data-build-time> and rewriting it after Hakyll lets
every page reflect the current build without paying recompilation cost.

* tools/stamp-build-time.py walks _site/**/*.html after Hakyll runs and
  rewrites each element wrapped in [data-build-time] to the same format
  Contexts.hs:buildTimeField emits, so a fresh render and the post-pass
  agree.
* templates/partials/footer.html wraps $build-time$ in
  <span class="footer-build-time" data-build-time>...</span> so the
  sweep has a stable selector.
* Makefile invokes the sweep between embed.py and compress-assets so
  the .gz/.br sidecars include the fresh stamp.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 12:05:28 -04:00
Levi Neuwirth 154b47a4cb Marks II: broader monogram coverage + audit-marks tool
Extends the Phase-1 monogram mark system to every long-form content
type (essays, blog posts, poems, fiction, music) and introduces a
coverage audit so gaps are visible.

* build/Marks.hs gains hasMonogram (predicate), monogramSvgFieldFor +
  hasMonogramFieldFor (for explicit-path callers like the /build/ and
  /stats/ pages). Contexts.hs exports hasMonogramField as a siteCtx
  boolean so templates can conditionally render the slot without
  emitting an empty <div>.
* essay.html, blog-post.html, reading.html: hoist the frontmatter
  block out of <main id="markdownBody"> so the monogram + epistemic
  marks render as wrapper chrome rather than indexable prose; left
  + right mark slots are now unconditional (CSS handles the empty
  state) so the layout is grid-stable across pieces.
* templates/partials/item-card.html: optional monogram chip on cards
  (item-card--has-monogram modifier), gated on $has-monogram$ so
  monogram-less pieces stay flush.
* build/Stats.hs grows a "Marks coverage" telemetry section: per-type
  pieces / monogram / epistemic-figure counts + a coverage rollup,
  rendered between epistemic and output on /build/.
* tools/audit-marks.py: coverage report (ASCII table) walking
  content/**/*.md, plus a pre-commit hook at
  tools/hooks/pre-commit-marks.sh that runs the same scan against
  newly-staged .md files. New `make audit-marks` runs the report
  manually; the hook gates commits.
* static/css/marks.css: layout for the new frontmatter slots and the
  item-card monogram chip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 12:05:08 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
Levi Neuwirth 339433db20 Quote rsync target variables in Makefile deploy
A VPS_PATH containing whitespace or shell metacharacters would split
on the unquoted expansion and hand rsync extra arguments. The
existing VPS_PATH guard rejects obviously dangerous parents (/srv,
/var, etc.) but does not catch this. Quoting fails closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:08:23 -04:00
Levi Neuwirth cd94227acb Spec dilemma 2026-05-01 21:22:01 -04:00
Levi Neuwirth 42ba2bf972 Current rework 2026-04-26 19:42:47 -04:00
Levi Neuwirth 6585573dae States/Context/Embeddings fixes 2026-04-26 11:22:57 -04:00
Levi Neuwirth 6d2f9d12ae PDF compression 2026-04-22 12:40:22 -04:00
Levi Neuwirth 3a95a05284 Fix broken PDF hyperlinks 2026-04-22 12:10:31 -04:00
Levi Neuwirth 913a374fb2 Professional content refactor 2026-04-22 11:46:57 -04:00
Levi Neuwirth acb3ae7066 visual enhancements 2026-04-15 22:25:38 -04:00
Levi Neuwirth b02e1e868d audit: tooling, deploy ordering, README, repo hygiene 2026-04-10 17:41:33 -04:00
Levi Neuwirth c864e2f9cc makefile corrections + esoteric math rendering 2026-04-05 12:00:07 -04:00
Levi Neuwirth 1be6292757 ToC fix 2026-03-27 16:19:52 -04:00
Levi Neuwirth a5495035be epistemic v2 2026-03-26 09:10:35 -04:00
Levi Neuwirth 728afd4c68 affiliation, cabal helper script 2026-03-26 08:14:50 -04:00
Levi Neuwirth 5cfbfbc0ef GPG signing, embedding pipeline, visualization filter, search timing, sig popups
- GPG page signing: dedicated signing subkey in ~/.gnupg-signing, sign-site.sh
  walks _site/**/*.html producing .sig files, preset-signing-passphrase.sh caches
  passphrase via gpg-preset-passphrase; make sign target; make deploy chains it
- Footer sig link: $url$.sig with hover popup showing ASCII armor (popups.js
  sigContent provider; .footer-sig-link bound explicitly to bypass footer exclusion)
- Public key at static/gpg/pubkey.asc
- Embedding pipeline: tools/embed.py encodes _site pages with nomic-embed-text-v1.5
  + FAISS IndexFlatIP, writes data/similar-links.json; staleness check skips when
  JSON is newer than all HTML; make build invokes via uv, skips gracefully if .venv absent
- SimilarLinks.hs: similarLinksField loads similar-links.json with Hakyll dependency
  tracking; renders Related section in page-footer.html
- uv environment: pyproject.toml + uv.lock (CPU-only torch via pytorch-cpu index)
- Visualization filter: Filters/Viz.hs runs Python scripts for .figure (SVG) and
  .visualization (Vega-Lite JSON) fenced divs; viz.js renders with monochrome config
  and MutationObserver dark-mode re-render; viz.css layout
- Search timing: #search-timing element shows elapsed ms via MutationObserver
- Build telemetry timestamps removed from git tracking (now in .gitignore)
- spec.md updated to v9; WRITING.md updated with viz, related, signing, build docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 20:14:49 -04:00
Levi Neuwirth 26c067147a Build telemetry 2026-03-19 15:27:12 -04:00
Levi Neuwirth 9c47811372 Administrativa. 2026-03-17 22:31:24 -04:00
Levi Neuwirth 714824a0b5 initial deploy! whoop 2026-03-17 21:56:14 -04:00