Commit Graph

13 Commits

Author SHA1 Message Date
Levi Neuwirth 7c5354efa7 embed.py: split page vs paragraph embedding models
Pages (similar-links.json, build-only) move to nomic-embed-text-v1.5
(768d) with an on-disk npz cache; paragraphs (browser semantic search)
stay on all-MiniLM-L6-v2 (384d), so the client contract is unchanged.
WRITING.md search row updated accordingly. einops added for nomic's
remote modeling code; cache gitignored with a trailing glob so
interrupted-write debris is covered too.

Known follow-ups (AUDIT-2026-06-09.md §1.3, §4): pin the
nomic-bert-2048 remote code, catch BadZipFile in cache loads, fix the
staleness check defeated by stamp-build-time ordering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
Levi Neuwirth 41c8033eee Prune stale README.*.md entries from .gitignore
README.profile.md, README.arcana.md, README.simd.md, README.icd.md,
README.neuropose.md never existed in the working tree. Cosmetic
cleanup; the credential-shaped patterns and content/drafts/ entries
are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:09:13 -04:00
Levi Neuwirth cd94227acb Spec dilemma 2026-05-01 21:22:01 -04:00
Levi Neuwirth 42ba2bf972 Current rework 2026-04-26 19:42:47 -04:00
Levi Neuwirth 6585573dae States/Context/Embeddings fixes 2026-04-26 11:22:57 -04:00
Levi Neuwirth 3a95a05284 Fix broken PDF hyperlinks 2026-04-22 12:10:31 -04:00
Levi Neuwirth 913a374fb2 Professional content refactor 2026-04-22 11:46:57 -04:00
Levi Neuwirth b02e1e868d audit: tooling, deploy ordering, README, repo hygiene 2026-04-10 17:41:33 -04:00
Levi Neuwirth 728afd4c68 affiliation, cabal helper script 2026-03-26 08:14:50 -04:00
Levi Neuwirth 5cfbfbc0ef GPG signing, embedding pipeline, visualization filter, search timing, sig popups
- GPG page signing: dedicated signing subkey in ~/.gnupg-signing, sign-site.sh
  walks _site/**/*.html producing .sig files, preset-signing-passphrase.sh caches
  passphrase via gpg-preset-passphrase; make sign target; make deploy chains it
- Footer sig link: $url$.sig with hover popup showing ASCII armor (popups.js
  sigContent provider; .footer-sig-link bound explicitly to bypass footer exclusion)
- Public key at static/gpg/pubkey.asc
- Embedding pipeline: tools/embed.py encodes _site pages with nomic-embed-text-v1.5
  + FAISS IndexFlatIP, writes data/similar-links.json; staleness check skips when
  JSON is newer than all HTML; make build invokes via uv, skips gracefully if .venv absent
- SimilarLinks.hs: similarLinksField loads similar-links.json with Hakyll dependency
  tracking; renders Related section in page-footer.html
- uv environment: pyproject.toml + uv.lock (CPU-only torch via pytorch-cpu index)
- Visualization filter: Filters/Viz.hs runs Python scripts for .figure (SVG) and
  .visualization (Vega-Lite JSON) fenced divs; viz.js renders with monochrome config
  and MutationObserver dark-mode re-render; viz.css layout
- Search timing: #search-timing element shows elapsed ms via MutationObserver
- Build telemetry timestamps removed from git tracking (now in .gitignore)
- spec.md updated to v9; WRITING.md updated with viz, related, signing, build docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 20:14:49 -04:00
Levi Neuwirth 9c47811372 Administrativa. 2026-03-17 22:31:24 -04:00
Levi Neuwirth 714824a0b5 initial deploy! whoop 2026-03-17 21:56:14 -04:00