Commit Graph

15 Commits

Author SHA1 Message Date
Levi Neuwirth 945086421a embed.py: hash-cache the paragraph pass; drop the dead mtime skip
The 'skip if outputs newer than every HTML' check could never fire:
stamp-build-time.py rewrites every page's footer AFTER embed.py runs,
so the comparison was always false and the full MiniLM paragraph pass
(and model load) ran on every build (AUDIT §4.3). Replaced with the
same content-hash cache the page pass already had — generalized
load/save_vec_cache, keyed by sha256 of the input text, invalidated on
model/revision/dim change. A no-change rerun now does no model loads:
measured 97s cold -> 4.8s warm.

Also strips section.footnotes from extraction: the new no-JS fallback
duplicates each sidenote's text at document end, which would double
footnotes in search results and skew page similarity.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:51:01 -04:00
Levi Neuwirth aeb2937f7c Drafts are local-only: untrack the four committed ones
.gitignore has declared content/drafts/ local-only working notes since
the rule was added, but four drafts were already tracked — ignore rules
don't untrack, so make build's auto-commit kept staging and deploy kept
pushing them (AUDIT §6.3). Untracked with --cached; the files remain on
disk and still build in dev. Also moved inclusionist-manifesto.md into
drafts/essays/ where the draft rule actually matches it (§6.1), and
un-shadowed the tracked .env.example from the credential patterns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:40:05 -04:00
Levi Neuwirth 7c5354efa7 embed.py: split page vs paragraph embedding models
Pages (similar-links.json, build-only) move to nomic-embed-text-v1.5
(768d) with an on-disk npz cache; paragraphs (browser semantic search)
stay on all-MiniLM-L6-v2 (384d), so the client contract is unchanged.
WRITING.md search row updated accordingly. einops added for nomic's
remote modeling code; cache gitignored with a trailing glob so
interrupted-write debris is covered too.

Known follow-ups (AUDIT-2026-06-09.md §1.3, §4): pin the
nomic-bert-2048 remote code, catch BadZipFile in cache loads, fix the
staleness check defeated by stamp-build-time ordering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 18:57:43 -04:00
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
Levi Neuwirth 41c8033eee Prune stale README.*.md entries from .gitignore
README.profile.md, README.arcana.md, README.simd.md, README.icd.md,
README.neuropose.md never existed in the working tree. Cosmetic
cleanup; the credential-shaped patterns and content/drafts/ entries
are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:09:13 -04:00
Levi Neuwirth cd94227acb Spec dilemma 2026-05-01 21:22:01 -04:00
Levi Neuwirth 42ba2bf972 Current rework 2026-04-26 19:42:47 -04:00
Levi Neuwirth 6585573dae States/Context/Embeddings fixes 2026-04-26 11:22:57 -04:00
Levi Neuwirth 3a95a05284 Fix broken PDF hyperlinks 2026-04-22 12:10:31 -04:00
Levi Neuwirth 913a374fb2 Professional content refactor 2026-04-22 11:46:57 -04:00
Levi Neuwirth b02e1e868d audit: tooling, deploy ordering, README, repo hygiene 2026-04-10 17:41:33 -04:00
Levi Neuwirth 728afd4c68 affiliation, cabal helper script 2026-03-26 08:14:50 -04:00
Levi Neuwirth 5cfbfbc0ef GPG signing, embedding pipeline, visualization filter, search timing, sig popups
- GPG page signing: dedicated signing subkey in ~/.gnupg-signing, sign-site.sh
  walks _site/**/*.html producing .sig files, preset-signing-passphrase.sh caches
  passphrase via gpg-preset-passphrase; make sign target; make deploy chains it
- Footer sig link: $url$.sig with hover popup showing ASCII armor (popups.js
  sigContent provider; .footer-sig-link bound explicitly to bypass footer exclusion)
- Public key at static/gpg/pubkey.asc
- Embedding pipeline: tools/embed.py encodes _site pages with nomic-embed-text-v1.5
  + FAISS IndexFlatIP, writes data/similar-links.json; staleness check skips when
  JSON is newer than all HTML; make build invokes via uv, skips gracefully if .venv absent
- SimilarLinks.hs: similarLinksField loads similar-links.json with Hakyll dependency
  tracking; renders Related section in page-footer.html
- uv environment: pyproject.toml + uv.lock (CPU-only torch via pytorch-cpu index)
- Visualization filter: Filters/Viz.hs runs Python scripts for .figure (SVG) and
  .visualization (Vega-Lite JSON) fenced divs; viz.js renders with monochrome config
  and MutationObserver dark-mode re-render; viz.css layout
- Search timing: #search-timing element shows elapsed ms via MutationObserver
- Build telemetry timestamps removed from git tracking (now in .gitignore)
- spec.md updated to v9; WRITING.md updated with viz, related, signing, build docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 20:14:49 -04:00
Levi Neuwirth 9c47811372 Administrativa. 2026-03-17 22:31:24 -04:00
Levi Neuwirth 714824a0b5 initial deploy! whoop 2026-03-17 21:56:14 -04:00