Commit Graph

6 Commits

Author SHA1 Message Date
Levi Neuwirth 23250d8782 Fix popup previews: proxy prefix-strip bug, arXiv IDs, Wikipedia images
The root cause of 'PDF/arXiv previews simply do not work' was twofold:

1. nginx/popup-proxy.conf was never installed on the VPS — every
   /proxy/* request (arXiv, PubMed, Internet Archive) returned nginx's
   default 404. Now installed (snippets + http{}-context cache/limit
   zones in conf.d, included in the vhost, nginx -t verified, reloaded).
2. The snippet itself had a latent bug that only surfaced once
   installed: with a VARIABLE upstream, a URI part on proxy_pass is
   passed literally — every request hit the upstream's homepage
   (archive.org HTML where JSON was expected, arXiv 429s, NCBI doc-page
   redirects). Fixed with explicit prefix-strip rewrites; bad cached
   responses purged. All three proxies verified returning real data,
   including a live arXiv title resolve.

Client-side improvements:
- arXiv match covers old-style IDs (cs/9901002, math.GT/0309136,
  cond-mat/...v1) alongside new-style, and .pdf-suffixed /pdf/ URLs
  (regex verified against six forms)
- Wikipedia popups show the article's lead image: pageimages rides
  along the existing extracts call (pithumbsize=320), rendered via a
  new https-only image slot in renderPopup with float styling;
  upload.wikimedia.org added to the CSP's img-src
- pdf-thumbs now walks all of static/ (pdfjs pruned), so /cv.pdf and
  /resume.pdf — the most-linked internal PDFs, previously thumbnail-less
  and therefore popup-less — get hover previews

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:06:13 -04:00
Levi Neuwirth f11495ff9a Fix audit tooling/infra findings
- embed.py: pin nomic's auto_map modeling repo via code_revision —
  revision= alone left nomic-bert-2048 unpinned under
  trust_remote_code (AUDIT §1.3; verified loadable with
  HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
  cache so a half-written npz is discarded, not fatal (§4.2), and
  unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
  continuation in quoted strings, so the old value embedded literal
  backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
  site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
  and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
  unrelated work is no longer swept into auto: commits (§8.3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
Levi Neuwirth 87819501a5 nginx: ship security baseline, reference vhost, and tighter cache
- Add nginx/security-headers.conf — server_tokens off, HSTS (1y +
  preload), X-Content-Type-Options, X-Frame-Options DENY,
  Referrer-Policy, Permissions-Policy, and a usage-scoped CSP. CSP
  ships in Report-Only mode; promote to enforcing once the report
  stream is clean for a week. CSP allowlists are derived from actual
  usage (cdn.jsdelivr.net for KaTeX/Vega, *.basemaps.cartocdn.com for
  Leaflet tiles); 'unsafe-inline' and 'unsafe-eval' are documented
  inline.
- Add nginx/vhost.conf.example — reference vhost showing the canonical
  include order. The live vhost on the VPS remains the source of
  truth; this file documents the structure so the VPS config can be
  reproduced or audited from the repo.
- Shorten unfingerprinted CSS/JS cache from 24h to 1h. Bug fixes ship
  to warm clients within an hour; if assets are ever fingerprinted,
  this can move to immutable.
- Refresh README repo layout — add nginx/ entry, drop stale paper/
  and spec.md references that never existed in the working tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:08:03 -04:00
Levi Neuwirth 6d2f9d12ae PDF compression 2026-04-22 12:40:22 -04:00
Levi Neuwirth 1a532f881b major visual changes - dingbats, footer, etc 2026-04-17 12:48:22 -04:00