Commit Graph

8 Commits

Author SHA1 Message Date
Levi Neuwirth 1027b88429 Rich reference popups: arXiv lead figures, prominent Wikipedia images
Reference popups (provider-rendered: arXiv, Wikipedia, CrossRef, …)
get a glanceable layout: wider container (560px), larger title and
body type, and a full-width image banner under the source label.
Internal page previews and item-card popups (new/library pages) keep
the compact layout — the shared popup element toggles
.link-popup--rich per show based on the rendered content.

- arXiv: a new best-effort enrich step fetches the paper's LaTeXML
  HTML rendition and pulls the first figure as a lead image. Enrich is
  time-boxed (1.8s) so the metadata popup is never held hostage; late
  results refresh the cache for the next hover. Figures letterbox with
  object-fit: contain (plots must not crop); Wikipedia photos
  cover-crop with an upper focal point. width/height attrs reserve
  aspect ratio so positioning is stable before the image loads.
- Wikipedia thumbnails request 480px for the banner width.
- nginx: new ^~ /proxy/arxiv-html/ location backed by arxiv.org proper
  (export.arxiv.org serves the Atom API but 429s the /html/ asset
  tree); 404s cached 1d (the common no-HTML-rendition case). All four
  proxy locations switched to ^~ — without it, static-assets.conf's
  per-extension regex location outranks plain prefixes and serves a
  local 404 for any proxied URL ending in an image extension, which is
  exactly how the first figure fetch failed.

Installed and verified live: proxied page (200, 298KB), figure (200
image/png), API unchanged, no-rendition 404 path; the full client
resolution chain (relative src -> proxy path -> guard -> image)
validated against production.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 13:02:15 -04:00
Levi Neuwirth 59fcc15ca6 nginx: preserve security baseline in every location; install on VPS
add_header is non-additive: any location declaring its own add_header
drops all server-context headers. archive.conf already re-included the
baseline for exactly this reason, but static-assets.conf (four cache
locations — including the JS/CSS responses where nosniff matters most)
and popup-proxy.conf (three proxy locations) did not. All seven now
re-include snippets/security-headers.conf.

Proxy locations additionally hide the upstream's own
STS/CSP/X-Frame-Options before re-adding ours: browsers honor only the
FIRST Strict-Transport-Security header (RFC 6797 §8.1), so arXiv's
max-age=300 passing through ahead of ours would have downgraded the
domain's cached HSTS policy on every popup fetch.

Server side (installed + verified live): security-headers.conf and
archive.conf wired into the vhost in vhost.conf.example's canonical
order; nginx-mod-brotli installed and loaded, so the .br sidecars
compress-assets.sh has always shipped are now actually served
(Content-Encoding: br verified). CSP remains Report-Only. Verified
headers on /, /css/*.css (baseline + Cache-Control together),
/archive/ (baseline + X-Robots-Tag), and /proxy/* (baseline +
X-Cache-Status, single STS).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:11:46 -04:00
Levi Neuwirth 23250d8782 Fix popup previews: proxy prefix-strip bug, arXiv IDs, Wikipedia images
The root cause of 'PDF/arXiv previews simply do not work' was twofold:

1. nginx/popup-proxy.conf was never installed on the VPS — every
   /proxy/* request (arXiv, PubMed, Internet Archive) returned nginx's
   default 404. Now installed (snippets + http{}-context cache/limit
   zones in conf.d, included in the vhost, nginx -t verified, reloaded).
2. The snippet itself had a latent bug that only surfaced once
   installed: with a VARIABLE upstream, a URI part on proxy_pass is
   passed literally — every request hit the upstream's homepage
   (archive.org HTML where JSON was expected, arXiv 429s, NCBI doc-page
   redirects). Fixed with explicit prefix-strip rewrites; bad cached
   responses purged. All three proxies verified returning real data,
   including a live arXiv title resolve.

Client-side improvements:
- arXiv match covers old-style IDs (cs/9901002, math.GT/0309136,
  cond-mat/...v1) alongside new-style, and .pdf-suffixed /pdf/ URLs
  (regex verified against six forms)
- Wikipedia popups show the article's lead image: pageimages rides
  along the existing extracts call (pithumbsize=320), rendered via a
  new https-only image slot in renderPopup with float styling;
  upload.wikimedia.org added to the CSP's img-src
- pdf-thumbs now walks all of static/ (pdfjs pruned), so /cv.pdf and
  /resume.pdf — the most-linked internal PDFs, previously thumbnail-less
  and therefore popup-less — get hover previews

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:06:13 -04:00
Levi Neuwirth f11495ff9a Fix audit tooling/infra findings
- embed.py: pin nomic's auto_map modeling repo via code_revision —
  revision= alone left nomic-bert-2048 unpinned under
  trust_remote_code (AUDIT §1.3; verified loadable with
  HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
  cache so a half-written npz is discarded, not fatal (§4.2), and
  unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
  continuation in quoted strings, so the old value embedded literal
  backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
  site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
  and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
  unrelated work is no longer swept into auto: commits (§8.3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:21:47 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
Levi Neuwirth 87819501a5 nginx: ship security baseline, reference vhost, and tighter cache
- Add nginx/security-headers.conf — server_tokens off, HSTS (1y +
  preload), X-Content-Type-Options, X-Frame-Options DENY,
  Referrer-Policy, Permissions-Policy, and a usage-scoped CSP. CSP
  ships in Report-Only mode; promote to enforcing once the report
  stream is clean for a week. CSP allowlists are derived from actual
  usage (cdn.jsdelivr.net for KaTeX/Vega, *.basemaps.cartocdn.com for
  Leaflet tiles); 'unsafe-inline' and 'unsafe-eval' are documented
  inline.
- Add nginx/vhost.conf.example — reference vhost showing the canonical
  include order. The live vhost on the VPS remains the source of
  truth; this file documents the structure so the VPS config can be
  reproduced or audited from the repo.
- Shorten unfingerprinted CSS/JS cache from 24h to 1h. Bug fixes ship
  to warm clients within an hour; if assets are ever fingerprinted,
  this can move to immutable.
- Refresh README repo layout — add nginx/ entry, drop stale paper/
  and spec.md references that never existed in the working tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:08:03 -04:00
Levi Neuwirth 6d2f9d12ae PDF compression 2026-04-22 12:40:22 -04:00
Levi Neuwirth 1a532f881b major visual changes - dingbats, footer, etc 2026-04-17 12:48:22 -04:00