Commit Graph

4 Commits

Author SHA1 Message Date
Levi Neuwirth 5d344f940e Last audit stragglers: scaffolder, refreeze safety, atomic-write polish
- add-popup-source.sh: slug validated against ^[a-z0-9-]+$ before nginx
  interpolation; UPSTREAM_HOST derived unconditionally so the CSP
  reminder fires in the no-proxy case — which is exactly when the host
  must be added to connect-src (AUDIT §4.8)
- refreeze.sh: backs up the freeze and restores it on a failed resolve
  instead of leaving the repo with no freeze file (§4.9)
- einops gets the policy-mandated upper bound and a comment naming its
  consumer (nomic's remote modeling code) (§1.5)
- Makefile: pdftoppm failures warn instead of vanishing in the while
  pipeline; .NOTPARALLEL guards deploy's clean->build->sign ordering
  against -j invocations (§8.4)
- Atomic writers (embed, archive, the three sidecar extractors):
  PID-unique temp names so concurrent runs can't interleave, cleanup on
  failure everywhere, fsync where the artifact is not trivially
  regenerable (§4.10)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:43:14 -04:00
Levi Neuwirth 9f61ce5949 Tooling, manifest, and content polish
- import-photo.sh deletes the copied JPEG when EXIF stripping fails, so
  the auto-commit can never publish GPS/serial metadata (AUDIT §4.11)
- pre-commit-marks hook: tab-aware path parsing, probes the staged blob
  rather than the working tree (§4.11)
- preset-signing-passphrase uses printf; stamp-build-time writes via
  temp + os.replace; archive.py passes -- to pdftotext and verifies the
  vendored monolith binary against its recorded sha256 (mismatch is
  fatal, consistent with the tool's integrity contract); extract-exif
  ./-prefixes relative paths (§4.11)
- blog-post.html: id="similar-links"/"backlinks" each appear once;
  rendered output unchanged (§6.4)
- site.webmanifest: start_url/scope/description added, maskable icon
  purpose restored alongside any (§9.3)
- Frontmatter cleanup: scaffold comments out of scaling_outage,
  dangling null confidence-history keys removed (populated ones kept),
  dead modified: key dropped from colophon (§6.4)
- canto31.jpg: 4.0 MB -> 1.9 MB (2400px, q80, grayscale — the source
  is a monochrome Doré engraving, so single-channel is colorimetrically
  lossless); webp sidecar regenerated (§6.4, prior-audit §6.1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:13:34 -04:00
Levi Neuwirth c17c203747 Tooling robustness: atomic writes, verified downloads
- archive.py: PROVENANCE.json / archive-index.json / archive-state.json
  now written atomically (tmp + os.replace) — a truncated integrity
  record is the one thing this tool must never produce (AUDIT §4.4);
  manifest entries validated as mappings up front (§4.7); refresh
  rejects provenance with a missing/empty artifact key instead of
  crashing on IsADirectoryError (§4.7); wayback save URL quotes
  unsafe characters (§4.7)
- download-leaflet.sh: existing files are re-verified before being
  skipped, and downloads land in a .part temp moved into place only
  after checksum verification — a failed verification can no longer
  leave a bad file that the next run silently accepts (§4.5)
- download-model.sh, convert-images.sh: same temp-then-move pattern so
  interrupted downloads/conversions never persist at final paths (§4.6)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 09:43:25 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00