levineuwirth.org/tools
Levi Neuwirth 945086421a embed.py: hash-cache the paragraph pass; drop the dead mtime skip
The 'skip if outputs newer than every HTML' check could never fire:
stamp-build-time.py rewrites every page's footer AFTER embed.py runs,
so the comparison was always false and the full MiniLM paragraph pass
(and model load) ran on every build (AUDIT §4.3). Replaced with the
same content-hash cache the page pass already had — generalized
load/save_vec_cache, keyed by sha256 of the input text, invalidated on
model/revision/dim change. A no-change rerun now does no model loads:
measured 97s cold -> 4.8s warm.

Also strips section.footnotes from extraction: the new no-JS fallback
duplicates each sidenote's text at document end, which would double
footnotes in search results and skew page similarity.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:51:01 -04:00
..
bin Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
hooks Marks II: broader monogram coverage + audit-marks tool 2026-05-23 12:05:08 -04:00
add-popup-source.sh major visual changes - dingbats, footer, etc 2026-04-17 12:48:22 -04:00
archive.py Tooling robustness: atomic writes, verified downloads 2026-06-10 09:43:25 -04:00
audit-marks.py Marks II: broader monogram coverage + audit-marks tool 2026-05-23 12:05:08 -04:00
compress-assets.sh Validate tool inputs and surface tracebacks on errors 2026-05-07 15:09:02 -04:00
convert-images.sh Tooling robustness: atomic writes, verified downloads 2026-06-10 09:43:25 -04:00
download-leaflet.sh Tooling robustness: atomic writes, verified downloads 2026-06-10 09:43:25 -04:00
download-model.sh Tooling robustness: atomic writes, verified downloads 2026-06-10 09:43:25 -04:00
download-pdfjs.sh PDF compression 2026-04-22 12:40:22 -04:00
embed.py embed.py: hash-cache the paragraph pass; drop the dead mtime skip 2026-06-10 10:51:01 -04:00
extract-dimensions.py Validate tool inputs and surface tracebacks on errors 2026-05-07 15:09:02 -04:00
extract-exif.py Validate tool inputs and surface tracebacks on errors 2026-05-07 15:09:02 -04:00
extract-palette.py Validate tool inputs and surface tracebacks on errors 2026-05-07 15:09:02 -04:00
import-photo.sh Validate tool inputs and surface tracebacks on errors 2026-05-07 15:09:02 -04:00
import-poetry.py audit: tooling, deploy ordering, README, repo hygiene 2026-04-10 17:41:33 -04:00
leaflet-checksums.sha256 Spec dilemma 2026-05-01 21:22:01 -04:00
model-checksums.sha256 Pin Hugging Face model revisions for downloader and embed pipeline 2026-05-07 15:08:14 -04:00
monolith-version.txt Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
pdfjs-checksums.sha256 Fix broken PDF hyperlinks 2026-04-22 12:10:31 -04:00
preset-signing-passphrase.sh GPG signing, embedding pipeline, visualization filter, search timing, sig popups 2026-03-20 20:14:49 -04:00
refreeze.sh affiliation, cabal helper script 2026-03-26 08:14:50 -04:00
sign-site.sh States/Context/Embeddings fixes 2026-04-26 11:22:57 -04:00
stamp-build-time.py Stamp the site-wide build time post-render 2026-05-23 12:05:28 -04:00
subset-fonts.sh initial deploy! whoop 2026-03-17 21:56:14 -04:00
viz_theme.py GPG signing, embedding pipeline, visualization filter, search timing, sig popups 2026-03-20 20:14:49 -04:00