Reference popups (provider-rendered: arXiv, Wikipedia, CrossRef, …)
get a glanceable layout: wider container (560px), larger title and
body type, and a full-width image banner under the source label.
Internal page previews and item-card popups (new/library pages) keep
the compact layout — the shared popup element toggles
.link-popup--rich per show based on the rendered content.
- arXiv: a new best-effort enrich step fetches the paper's LaTeXML
HTML rendition and pulls the first figure as a lead image. Enrich is
time-boxed (1.8s) so the metadata popup is never held hostage; late
results refresh the cache for the next hover. Figures letterbox with
object-fit: contain (plots must not crop); Wikipedia photos
cover-crop with an upper focal point. width/height attrs reserve
aspect ratio so positioning is stable before the image loads.
- Wikipedia thumbnails request 480px for the banner width.
- nginx: new ^~ /proxy/arxiv-html/ location backed by arxiv.org proper
(export.arxiv.org serves the Atom API but 429s the /html/ asset
tree); 404s cached 1d (the common no-HTML-rendition case). All four
proxy locations switched to ^~ — without it, static-assets.conf's
per-extension regex location outranks plain prefixes and serves a
local 404 for any proxied URL ending in an image extension, which is
exactly how the first figure fetch failed.
Installed and verified live: proxied page (200, 298KB), figure (200
image/png), API unchanged, no-rendition 404 path; the full client
resolution chain (relative src -> proxy path -> guard -> image)
validated against production.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
add_header is non-additive: any location declaring its own add_header
drops all server-context headers. archive.conf already re-included the
baseline for exactly this reason, but static-assets.conf (four cache
locations — including the JS/CSS responses where nosniff matters most)
and popup-proxy.conf (three proxy locations) did not. All seven now
re-include snippets/security-headers.conf.
Proxy locations additionally hide the upstream's own
STS/CSP/X-Frame-Options before re-adding ours: browsers honor only the
FIRST Strict-Transport-Security header (RFC 6797 §8.1), so arXiv's
max-age=300 passing through ahead of ours would have downgraded the
domain's cached HSTS policy on every popup fetch.
Server side (installed + verified live): security-headers.conf and
archive.conf wired into the vhost in vhost.conf.example's canonical
order; nginx-mod-brotli installed and loaded, so the .br sidecars
compress-assets.sh has always shipped are now actually served
(Content-Encoding: br verified). CSP remains Report-Only. Verified
headers on /, /css/*.css (baseline + Cache-Control together),
/archive/ (baseline + X-Robots-Tag), and /proxy/* (baseline +
X-Cache-Status, single STS).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The root cause of 'PDF/arXiv previews simply do not work' was twofold:
1. nginx/popup-proxy.conf was never installed on the VPS — every
/proxy/* request (arXiv, PubMed, Internet Archive) returned nginx's
default 404. Now installed (snippets + http{}-context cache/limit
zones in conf.d, included in the vhost, nginx -t verified, reloaded).
2. The snippet itself had a latent bug that only surfaced once
installed: with a VARIABLE upstream, a URI part on proxy_pass is
passed literally — every request hit the upstream's homepage
(archive.org HTML where JSON was expected, arXiv 429s, NCBI doc-page
redirects). Fixed with explicit prefix-strip rewrites; bad cached
responses purged. All three proxies verified returning real data,
including a live arXiv title resolve.
Client-side improvements:
- arXiv match covers old-style IDs (cs/9901002, math.GT/0309136,
cond-mat/...v1) alongside new-style, and .pdf-suffixed /pdf/ URLs
(regex verified against six forms)
- Wikipedia popups show the article's lead image: pageimages rides
along the existing extracts call (pithumbsize=320), rendered via a
new https-only image slot in renderPopup with float styling;
upload.wikimedia.org added to the CSP's img-src
- pdf-thumbs now walks all of static/ (pdfjs pruned), so /cv.pdf and
/resume.pdf — the most-linked internal PDFs, previously thumbnail-less
and therefore popup-less — get hover previews
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- embed.py: pin nomic's auto_map modeling repo via code_revision —
revision= alone left nomic-bert-2048 unpinned under
trust_remote_code (AUDIT §1.3; verified loadable with
HF_HUB_OFFLINE=1). Catch BadZipFile/EOFError when loading the page
cache so a half-written npz is discarded, not fatal (§4.2), and
unlink the tmp file on a failed save (§4.1)
- nginx: collapse the CSP to one physical line — nginx has no line
continuation in quoted strings, so the old value embedded literal
backslash+LF bytes, illegal in HTTP/2 (§8.1). Add the externals the
site actually uses: KaTeX webfonts + onnxruntime wasm via jsdelivr,
and the popup provider APIs popups.js documents (§8.2)
- Makefile: pathspec-limit the auto-commit to content/ so pre-staged
unrelated work is no longer swept into auto: commits (§8.3)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.
* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
vendored monolith (tools/bin/monolith @ 2.10.1) into a single
self-contained file with the archive CSP and a noarchive robots meta
injected. Per-entry PROVENANCE.json committed; gitignored .txt
sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
— Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
filter that appends an archive affordance to live citations and
flips dead ones to the local copy on archive.py check's asymmetric
hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
canonicalises them to /archive/<slug>/ in pass 2, producing a
"Referenced by" section grouped by the fragment each citation
targets. build/Stats.hs gains a "Link archive" telemetry block on
/build/ (count, total size, median age, by-status / by-quality /
by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
both re-hash every committed artifact, so a tampered file halts the
build even with cabal invoked directly or no .venv present. refresh
refuses to replace an uncommitted prior snapshot and rolls back
atomically on any exit path. removed.yaml is honoured by fetch,
wayback, and check using canonical-form (tracking-stripped,
arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
artifacts that cannot carry meta directives.
The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add nginx/security-headers.conf — server_tokens off, HSTS (1y +
preload), X-Content-Type-Options, X-Frame-Options DENY,
Referrer-Policy, Permissions-Policy, and a usage-scoped CSP. CSP
ships in Report-Only mode; promote to enforcing once the report
stream is clean for a week. CSP allowlists are derived from actual
usage (cdn.jsdelivr.net for KaTeX/Vega, *.basemaps.cartocdn.com for
Leaflet tiles); 'unsafe-inline' and 'unsafe-eval' are documented
inline.
- Add nginx/vhost.conf.example — reference vhost showing the canonical
include order. The live vhost on the VPS remains the source of
truth; this file documents the structure so the VPS config can be
reproduced or audited from the repo.
- Shorten unfingerprinted CSS/JS cache from 24h to 1h. Bug fixes ship
to warm clients within an hour; if assets are ever fingerprinted,
this can move to immutable.
- Refresh README repo layout — add nginx/ entry, drop stale paper/
and spec.md references that never existed in the working tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>