tooling

2026-04-15 11:41:27 -04:00 · 2026-04-15 11:41:27 -04:00 · 8e1c8833f5
parent d29f4f1b78
commit 8e1c8833f5
19 changed files with 2566 additions and 13 deletions
--- a/.gitignore
+++ b/.gitignore
@ -85,6 +85,20 @@ site/
 # Allow committing example videos under tests/fixtures/ explicitly.
 !tests/fixtures/**

+# --- Benchmark scratch directory -------------------------------------------
+# ``benchmarks/videos/`` is a workflow convenience: drop benchmark inputs
+# there, rsync the whole repo checkout to the research Mac, and the videos
+# travel alongside the code without needing a second path on either end.
+# Everything inside is ignored except the README; the global video-suffix
+# rules above already cover common extensions, and this belt catches any
+# other cruft (partial rsync transfers, .DS_Store, etc.) that might land
+# in the directory.
+#
+# Policy reminder: benchmark test videos only. Subject / clinical data
+# must still go through $XDG_DATA_HOME — see benchmarks/README.md.
+/benchmarks/videos/*
+!/benchmarks/videos/README.md
+
 # --- Secrets ---------------------------------------------------------------
 .env
 .env.*
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -86,7 +86,13 @@ be split into per-release sections once tagging begins.
  width, height), `VideoPredictions` (metadata envelope + frames
  mapping + optional `segmentations` field), `JobResults`,
  `JobStatus` enum, `JobStatusEntry` (with a structured `error`
-  field), and `StatusFile`. Performance schema: frozen
+  field plus optional live-progress fields — `current_video`,
+  `frames_processed`, `frames_total`, `videos_completed`,
+  `videos_total`, `percent_complete`, `last_update` — populated by
+  the interfacer during inference and consumed by
+  `neuropose.monitor`), and `StatusFile`. Legacy status files
+  written before the progress fields existed still load cleanly
+  because every new field is optional with a `None` default. Performance schema: frozen
  `PerformanceMetrics` carrying per-call timings
  (`model_load_seconds`, `total_seconds`, `per_frame_latencies_ms`),
  `peak_rss_mb`, `active_device`, `tensorflow_metal_active`, and
@ -154,6 +160,48 @@ be split into per-release sections once tagging begins.
  download was truncated). Post-load interface check for
  `detect_poses`, `per_skeleton_joint_names`, and
  `per_skeleton_joint_edges`.
+- **`neuropose.monitor`** — localhost HTTP status dashboard. A small
+  `http.server`-based HTTP server (pure stdlib, zero new runtime
+  dependencies) that serves a plain HTML page at `GET /` with an
+  auto-refresh meta tag, one row per tracked job, a
+  `<progress>` bar, and a stale-entry warning badge for
+  `processing` jobs whose `last_update` has not ticked in 60 s.
+  `GET /status.json` returns the raw validated `StatusFile` as JSON
+  for `curl`/scripted pipelines; `?job=<name>` filters to a single
+  entry. `GET /health` is a simple liveness probe. Binds to
+  `127.0.0.1:8765` by default — loopback-only, with an explicit
+  `--host` override required to expose externally. Every request
+  re-reads `status.json`, so the monitor has no in-memory cache, no
+  sync protocol with the daemon, and stays useful even if the
+  daemon is down (last-known state surfaced with the stale badge).
+- **Progress checkpointing in the interfacer.** `Interfacer` now
+  updates the currently-running job's `JobStatusEntry` every
+  `settings.status_checkpoint_every_frames` frames (default 30, a
+  new `Settings` field) during inference via the estimator's
+  `progress` callback. Each checkpoint rewrites `status.json`
+  atomically through the existing `save_status` helper; writes are
+  best-effort and I/O failures are logged without interrupting
+  inference. `_run_job_inner` seeds a "videos_total=N" checkpoint
+  before calling the estimator so the monitor shows the job's
+  scope from the first poll. Checkpoint cadence is knob-exposed for
+  operators who want to tune the smoothness-vs-write-rate trade-off.
+- **`neuropose.ingest`** — zip-archive intake utility. `ingest_zip()`
+  extracts a zip of videos into one job directory per video under
+  `$data_dir/in/`, with validation-before-write (path-traversal and
+  absolute-path members rejected, oversize archives rejected at the
+  20 GB-uncompressed cap), zip-internal and external collision
+  detection reported in one shot, non-video members silently
+  skipped (`.DS_Store`, `README.md`, etc.), and per-job atomic
+  placement via a staging directory + `os.rename`. Nested paths are
+  flattened into job names by joining components with underscores
+  and sanitising unsafe characters — `patient_001/trial_01.mp4`
+  becomes job `patient_001_trial_01`, preserving disambiguation
+  against a sibling `patient_002/trial_01.mp4`. Typed exception
+  hierarchy: `IngestError`, `ArchiveInvalidError`,
+  `ArchiveEmptyError`, `ArchiveTooLargeError`, `JobCollisionError`
+  (with a `.collisions` list of offending names). The running
+  daemon needs no changes — ingested job dirs are picked up on the
+  next poll.
 - **`neuropose.benchmark`** — multi-pass inference benchmarking for
  a single video. `run_benchmark()` runs `process_video` N times
  (default 5), always discards the first pass as warmup (graph
@ -200,10 +248,19 @@ be split into per-release sections once tagging begins.
    `slow`) loads the real model and asserts the constant still
    matches, so any upstream skeleton drift fails CI.
 - **`neuropose.cli`** — Typer-based command-line interface with
-  five subcommands: `watch` (run the daemon), `process <video>`
-  (run the estimator on a single video), `segment <results>`
-  (post-hoc repetition segmentation — loads a JobResults or a
-  single VideoPredictions, runs
+  seven subcommands: `watch` (run the daemon), `process <video>`
+  (run the estimator on a single video), `ingest <archive>` (unzip
+  a video archive into per-video job directories under
+  `$data_dir/in/` with validation-before-write and atomic
+  placement; `--force` overwrites collisions, otherwise the whole
+  operation refuses if any target name already exists),
+  `serve` (start the localhost HTTP monitor at `127.0.0.1:8765`
+  by default — `--host` and `--port` are the two overrides;
+  KeyboardInterrupt exits with the standard shell-interruption
+  code and an `OSError` at bind time is translated to a clean
+  usage error with the bind target in the message),
+  `segment <results>` (post-hoc repetition segmentation — loads a
+  JobResults or a single VideoPredictions, runs
  `neuropose.analyzer.segment.segment_predictions` with the chosen
  extractor and thresholds, and atomically writes the file back
  with the new segmentation attached under `--name`),
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@ -0,0 +1,115 @@
+# `benchmarks/`
+
+Workflow convenience directory for performance benchmarking. Drop
+videos into `benchmarks/videos/`, then rsync the whole repo to a
+research machine — the videos travel alongside the code without
+needing a second path on either end.
+
+This directory is **not** a persistent data store. It is a local
+scratch area for ephemeral benchmark inputs. Everything under
+`benchmarks/videos/` is gitignored (see the `.gitignore` rules and
+the per-directory README) so committing videos by accident is
+mechanically prevented.
+
+## Policy
+
+**Only put benchmark test videos here. Never subject or clinical
+recordings.** Subject data still goes through
+`$XDG_DATA_HOME/neuropose/` via the normal interfacer workflow, and
+that rule is load-bearing for the project's data-handling posture.
+
+The distinction in practice:
+
+| Type of video                                   | Goes here | Goes to `$XDG_DATA_HOME` |
+| ------------------------------------------------ | --------- | ------------------------ |
+| Synthetic test videos (gradient frames, etc.)    | Yes       | No                       |
+| Public-domain reference footage                  | Yes       | No                       |
+| Recordings you personally filmed for benchmarking | Yes       | No                       |
+| Anything a clinician recorded                    | **No**    | Yes                      |
+| Anything with an identifiable subject            | **No**    | Yes                      |
+| Anything IRB-gated                               | **No**    | Yes                      |
+
+When in doubt, route through `$XDG_DATA_HOME`. That path has
+cross-machine isolation by design; this directory does not.
+
+## Usage
+
+Assuming you have a short test video to work with:
+
+```console
+$ cp ~/Downloads/short_clip.mp4 benchmarks/videos/
+$ uv run neuropose benchmark benchmarks/videos/short_clip.mp4 \
+    --repeats 5 --warmup-frames 3 \
+    --output benchmarks/videos/short_clip_run.json
+```
+
+The `*.json` benchmark output is also gitignored — `.json` is not
+a tracked extension inside `benchmarks/videos/` because only
+`README.md` is whitelisted in that directory.
+
+## Rsyncing to the research Mac
+
+The directory layout is designed so one `rsync` path covers both
+code and videos:
+
+```console
+$ rsync -av --delete \
+    --exclude='.venv/' \
+    --exclude='site/' \
+    --exclude='.git/' \
+    ~/Repos/research/brown/shu/neuropose/ \
+    mac.local:~/Repos/research/brown/shu/neuropose/
+```
+
+After the sync, the videos in `benchmarks/videos/` on the Mac are
+identical to the ones on Linux, so a benchmark run on the Mac can
+reference the same filename the Linux report does — makes cross-
+machine comparisons trivial.
+
+Tips:
+
+- Add `--exclude='benchmarks/videos/*.json'` if you want to keep
+  per-machine benchmark results isolated.
+- `--delete` makes the target exactly mirror the source. Without
+  it, old files on the target persist — safer but surprising.
+- For one-off pushes, `scp benchmarks/videos/clip.mp4
+  mac.local:~/Repos/research/brown/shu/neuropose/benchmarks/videos/`
+  works without touching the rest of the repo.
+
+## Bulk intake via `neuropose ingest`
+
+When you have a whole batch of recordings instead of a single
+benchmark clip, drop the zip archive anywhere you like (the
+`benchmarks/videos/` directory is fine for transient test zips) and
+let `neuropose ingest` unpack them into per-video job directories
+for the running daemon:
+
+```console
+$ uv run neuropose ingest benchmarks/videos/session_2026-04-15.zip
+ingested 12 job(s) from benchmarks/videos/session_2026-04-15.zip (1842.3 MB, 2 non-video member(s) skipped)
+  patient_001_trial_01/trial_01.mp4
+  patient_001_trial_02/trial_02.mp4
+  ...
+```
+
+Each video becomes its own `$data_dir/in/<job_name>/` directory,
+and the daemon picks them up on its next poll with no further
+operator action. Pass `--force` to overwrite existing job
+directories with the same derived names. See the `neuropose
+ingest --help` output for the full flag surface and
+[`neuropose.ingest`](../docs/api/ingest.md) for the library API.
+
+**Reminder:** the `benchmarks/videos/` directory is for benchmark
+test data only, including the zip archives you pass to `ingest`.
+Actual clinical recordings should not transit through this
+directory — route those through `$XDG_DATA_HOME` instead.
+
+## Directory layout
+
+```
+benchmarks/
+├── README.md           # this file (tracked)
+└── videos/
+    ├── README.md       # placeholder to keep the directory tracked
+    └── <your-clip>.mp4 # ignored
+```
--- a/benchmarks/videos/README.md
+++ b/benchmarks/videos/README.md
@ -0,0 +1,12 @@
+# `benchmarks/videos/`
+
+Scratch directory for benchmark input videos. Everything in here
+except this README is gitignored.
+
+**Drop benchmark test videos only. Never subject or clinical
+recordings.** See `../README.md` for the full policy and the
+rsync-to-Mac workflow.
+
+This file exists so the directory itself is trackable — git does
+not track empty directories, and a placeholder README is more
+discoverable than a `.gitkeep`.
--- a/docs/api/ingest.md
+++ b/docs/api/ingest.md
@ -0,0 +1,3 @@
+# `neuropose.ingest`
+
+::: neuropose.ingest
--- a/docs/api/monitor.md
+++ b/docs/api/monitor.md
@ -0,0 +1,3 @@
+# `neuropose.monitor`
+
+::: neuropose.monitor
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -68,6 +68,77 @@ output. This is the "is `tensorflow-metal` producing correct
 numerics?" check that `RESEARCH.md`'s TensorFlow-version-compatibility
 section leaves open for v0.1.

+### monitor
+
+**Role:** localhost HTTP dashboard. Reads `$data_dir/out/status.json`
+on every request and serves a small HTML page (with an auto-refresh
+`<meta>` tag, per-job progress bars, and stale-entry warnings) plus
+the raw `StatusFile` as JSON at `/status.json`. Runs as a separate
+process from the daemon — operators start it with `neuropose serve`,
+it is safe to run alongside `neuropose watch`, and it stays useful
+even if the daemon is down (last-known state is shown with the stale
+badge flagging any lingering `processing` entries).
+
+Design choices:
+
+- **Pure stdlib.** The server is built on `http.server` with a small
+  request-handler subclass. No FastAPI, no Flask — this is a localhost
+  tool and the cost of a framework is not justified.
+- **Loopback by default.** Binds to `127.0.0.1:8765` with an explicit
+  `--host` flag to override. Collaborators on the same machine reach
+  it directly; collaborators elsewhere should go through an SSH
+  tunnel or explicitly configured reverse proxy. Binding to
+  `0.0.0.0` is a real network-exposure decision the operator should
+  make with eyes open.
+- **No cache.** Every request re-reads `status.json`, which is tiny
+  and already written atomically by the daemon, so no sync protocol
+  is needed between the two processes.
+- **Two surfaces, same data.** `GET /` renders HTML for browsers;
+  `GET /status.json` returns the raw `StatusFile` for `curl` /
+  scripted pipelines. `?job=<name>` filters to a single entry for
+  programmatic consumers that only care about one job.
+- **Live progress data comes from the interfacer**, not from the
+  monitor. The interfacer checkpoints the currently-running job's
+  `JobStatusEntry` every
+  `settings.status_checkpoint_every_frames` frames (default 30) via
+  the estimator's `progress` callback, updating `frames_processed`,
+  `percent_complete`, and `last_update` on the in-memory status file
+  and calling `save_status`. The monitor then reads those fields and
+  renders the progress bar — no separate live channel, no in-memory
+  cache on the monitor side.
+
+### ingest
+
+**Role:** bulk-intake utility. Accepts a zip archive of videos and
+produces one job directory per video under `$data_dir/in/`, ready for
+the `interfacer` daemon to pick up on its next poll.
+
+The ingester is a **pure library call** (`ingest_zip`) plus a thin
+CLI wrapper (`neuropose ingest`). It does not run inference itself —
+it only stages files so the existing watch-directory pipeline does
+the work. Key guarantees:
+
+- **Validate-before-write.** Path-traversal members, zip bombs, and
+  empty archives are rejected before any file lands on disk, so a
+  failed ingest leaves the operator with a clean state.
+- **Transactional placement.** Each video is extracted to a staging
+  directory under `$data_dir/.ingest_<uuid>/`, and only then
+  atomically renamed into `$data_dir/in/<job_name>/`. The daemon
+  never sees a half-populated job directory.
+- **Collision detection is up-front and exhaustive.** Zip-internal
+  collisions (two videos that flatten to the same job name) and
+  external collisions (a job directory already exists) are reported
+  as a single error listing every offending name. `--force`
+  deletes-and-replaces; without it, nothing is written.
+- **Flattening preserves disambiguation.** The in-archive path is
+  joined with underscores into the job name — `patient_001/trial_01.mp4`
+  → job `patient_001_trial_01` — so nested organisation survives the
+  flattening without collapsing into silent collisions.
+
+The set of accepted extensions comes from
+`neuropose.interfacer.VIDEO_EXTENSIONS`, so any format the daemon
+can already process is a valid ingest target.
+
 ### interfacer

 **Role:** job-lifecycle daemon. Watches `input_dir` for new job
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -93,6 +93,8 @@ nav:
      - neuropose.config: api/config.md
      - neuropose.estimator: api/estimator.md
      - neuropose.interfacer: api/interfacer.md
+      - neuropose.ingest: api/ingest.md
+      - neuropose.monitor: api/monitor.md
      - neuropose.io: api/io.md
      - neuropose.benchmark: api/benchmark.md
      - neuropose.analyzer.segment: api/segment.md
--- a/src/neuropose/cli.py
+++ b/src/neuropose/cli.py
@ -1,11 +1,17 @@
 """NeuroPose command-line interface.

-Five subcommands:
+Seven subcommands:

 - ``neuropose watch`` — run the :class:`~neuropose.interfacer.Interfacer`
  daemon against the configured input directory.
 - ``neuropose process <video>`` — run the estimator on a single video and
  write the predictions JSON to disk.
+- ``neuropose ingest <archive>`` — extract a zip archive of videos into
+  per-video job directories under ``$data_dir/in/``, queued for the
+  running daemon to process.
+- ``neuropose serve`` — start the :mod:`~neuropose.monitor` localhost
+  HTTP dashboard so collaborators can watch a run's progress in a
+  browser or via ``curl``.
 - ``neuropose segment <results>`` — post-hoc repetition segmentation of
  an existing predictions file. Attaches a named
  :class:`~neuropose.io.Segmentation` to every video it contains and
@ -256,6 +262,170 @@ def process(
    typer.echo(f"wrote {out_path} ({result.frame_count} frames)")


+# ---------------------------------------------------------------------------
+# ingest
+# ---------------------------------------------------------------------------
+
+
+@app.command()
+def ingest(
+    ctx: typer.Context,
+    archive: Annotated[
+        Path,
+        typer.Argument(
+            exists=True,
+            file_okay=True,
+            dir_okay=False,
+            readable=True,
+            help=(
+                "Path to a zip archive of videos. Each video inside becomes "
+                "its own job directory under $data_dir/in/ and is picked up "
+                "by the running daemon on its next poll."
+            ),
+        ),
+    ],
+    force: Annotated[
+        bool,
+        typer.Option(
+            "--force",
+            help=(
+                "Overwrite any existing job directories that collide with "
+                "the archive's contents. Without --force, the first "
+                "collision aborts the whole operation without writing to "
+                "disk."
+            ),
+        ),
+    ] = False,
+) -> None:
+    """Ingest a zip archive of videos into the daemon's input directory.
+
+    Each video member of the archive produces one new job directory in
+    ``$data_dir/in/<job_name>/``. Job names are derived from the
+    in-archive path (slashes become underscores, extension dropped,
+    unsafe characters sanitized) so nested structures like
+    ``patient_001/trial_01.mp4`` and ``patient_002/trial_01.mp4``
+    cannot collide into the same job.
+
+    The ingest is transactional: either every video in the archive is
+    placed, or none are. Zip-internal collisions, external collisions
+    with existing job directories, path-traversal members, and total-
+    size overruns are all caught before any disk writes happen.
+
+    Non-video members (``README.md``, ``.DS_Store``, etc.) are silently
+    skipped. The set of accepted extensions matches the interfacer's
+    :data:`~neuropose.interfacer.VIDEO_EXTENSIONS`.
+    """
+    # Deferred import keeps the ingest module's dependencies (zipfile,
+    # shutil) out of the CLI's top-level import surface.
+    from neuropose.ingest import (
+        ArchiveEmptyError,
+        ArchiveInvalidError,
+        ArchiveTooLargeError,
+        IngestError,
+        JobCollisionError,
+        ingest_zip,
+    )
+
+    settings: Settings = ctx.obj
+    try:
+        result = ingest_zip(
+            archive,
+            input_dir=settings.input_dir,
+            force=force,
+        )
+    except JobCollisionError as exc:
+        typer.echo(f"error: {exc}", err=True)
+        for name in exc.collisions:
+            typer.echo(f"  - {name}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+    except ArchiveEmptyError as exc:
+        typer.echo(f"error: {exc}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+    except ArchiveTooLargeError as exc:
+        typer.echo(f"error: {exc}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+    except ArchiveInvalidError as exc:
+        typer.echo(f"error: {exc}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+    except IngestError as exc:
+        typer.echo(f"error: {exc}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+
+    typer.echo(
+        f"ingested {result.job_count} job(s) from {archive} "
+        f"({result.total_uncompressed_bytes / (1024 * 1024):.1f} MB, "
+        f"{len(result.skipped_non_videos)} non-video member(s) skipped)"
+    )
+    for job in result.ingested:
+        typer.echo(f"  {job.job_name}/{job.video_filename}")
+
+
+# ---------------------------------------------------------------------------
+# serve
+# ---------------------------------------------------------------------------
+
+
+@app.command()
+def serve(
+    ctx: typer.Context,
+    host: Annotated[
+        str,
+        typer.Option(
+            "--host",
+            help=(
+                "Interface to bind. Defaults to 127.0.0.1 so the "
+                "monitor is loopback-only; pass 0.0.0.0 or an explicit "
+                "external IP only when you have thought through the "
+                "network exposure."
+            ),
+        ),
+    ] = "127.0.0.1",
+    port: Annotated[
+        int,
+        typer.Option(
+            "--port",
+            min=1,
+            max=65535,
+            help="TCP port to listen on.",
+        ),
+    ] = 8765,
+) -> None:
+    """Serve the NeuroPose localhost status dashboard.
+
+    Reads the daemon's ``status.json`` (at ``Settings.status_file``)
+    on every HTTP request and renders a small dashboard at ``/`` plus
+    JSON at ``/status.json`` and a liveness probe at ``/health``.
+    Safe to run alongside ``neuropose watch``; safe to run without
+    the daemon (last-known state is served with a stale-entry warning
+    for any lingering ``processing`` jobs).
+
+    Blocks in ``serve_forever`` until interrupted by SIGINT or
+    SIGTERM, at which point the server shuts down cleanly.
+    """
+    settings: Settings = ctx.obj
+    # Deferred import: the monitor module and its stdlib HTTP server
+    # are only needed for this subcommand, and keeping them off the
+    # watch/process hot path avoids paying their import cost on the
+    # inference daemon's startup.
+    from neuropose.monitor import serve_forever
+
+    status_path = settings.status_file
+    typer.echo(f"serving neuropose monitor on http://{host}:{port}/")
+    typer.echo(f"  reading {status_path}")
+    typer.echo(f"  JSON:  http://{host}:{port}/status.json")
+    typer.echo("  press Ctrl-C to stop")
+    try:
+        serve_forever(status_path, host=host, port=port)
+    except KeyboardInterrupt as exc:
+        raise typer.Exit(code=EXIT_INTERRUPTED) from exc
+    except OSError as exc:
+        # Typical failure: address already in use, or permission
+        # denied binding to a privileged port. Translate to a clean
+        # usage error rather than a raw traceback.
+        typer.echo(f"error: could not bind {host}:{port}: {exc}", err=True)
+        raise typer.Exit(code=EXIT_USAGE) from exc
+
+
 # ---------------------------------------------------------------------------
 # segment
 # ---------------------------------------------------------------------------
--- a/src/neuropose/config.py
+++ b/src/neuropose/config.py
@ -78,6 +78,19 @@ class Settings(BaseSettings):
    poll_interval_seconds: int = Field(default=10, ge=1)
    device: str = Field(default="/CPU:0")
    default_fov_degrees: float = Field(default=55.0, gt=0.0, lt=180.0)
+    status_checkpoint_every_frames: int = Field(
+        default=30,
+        ge=1,
+        description=(
+            "Write the in-progress job's status entry back to status.json "
+            "every N frames during inference. Powers the live progress "
+            "bar in neuropose.monitor. Lower values produce smoother "
+            "progress at the cost of more atomic-rename writes; higher "
+            "values are cheaper but leave collaborators staring at a "
+            "stale percentage for longer. The default (30) is a smooth "
+            "update every ~3 s at 10 fps inference."
+        ),
+    )

    @field_validator("device")
    @classmethod
--- a/src/neuropose/ingest.py
+++ b/src/neuropose/ingest.py
@ -0,0 +1,421 @@
+"""Zip-archive ingestion for the NeuroPose job queue.
+
+Operators frequently arrive with a zip archive of videos — pulled off a
+camera SD card, exported from a study laptop, etc. — and want every
+video inside processed as an independent job without manually
+unpacking the archive, renaming collisions, and copying files into
+``$data_dir/in/<job_name>/`` one at a time. This module automates
+exactly that workflow.
+
+Responsibilities
+----------------
+- **Validate the archive before touching disk.** Path-traversal members
+  (absolute paths, ``..`` components) are refused; the total
+  uncompressed size is capped; only files whose suffixes are in
+  :data:`neuropose.interfacer.VIDEO_EXTENSIONS` are considered as
+  candidates. Non-video members are logged and skipped, not treated
+  as errors — zips commonly carry ``.DS_Store``, ``README.md``, etc.
+- **Derive one job name per video.** The in-archive path is
+  slash-replaced, extension-stripped, and sanitized so that nested
+  structure like ``patient_001/trial_01.mp4`` becomes
+  ``patient_001_trial_01`` and never collides with a sibling
+  ``trial_01.mp4`` in another directory.
+- **Atomic per-job placement.** Each video is extracted to a staging
+  directory under ``$data_dir/.ingest_<uuid>/`` first. Only after the
+  whole archive has been successfully extracted does the ingester
+  rename each staged job directory into ``$data_dir/in/<job_name>/``
+  with :func:`os.rename`, which is atomic on the same filesystem.
+  This way the daemon never sees a half-extracted job directory and
+  the "empty directory mid-copy" skip heuristic in the interfacer is
+  never exercised during ingest.
+- **Collision detection, up-front.** Zip-internal collisions (two
+  videos that would flatten to the same job name) and external
+  collisions (a job directory of the same name already exists under
+  ``in/``) are detected *before* any disk write. The default is to
+  refuse the operation and report every colliding name at once;
+  ``force=True`` deletes the colliding ``in/<job_name>/`` directories
+  and proceeds.
+
+This module is pure library: the CLI wrapper lives in
+:mod:`neuropose.cli`, and the function here does not touch the
+logger's configuration. Errors surface as typed exceptions so the
+CLI can translate them into stable exit codes.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+import shutil
+import uuid
+import zipfile
+from dataclasses import dataclass, field
+from pathlib import Path, PurePosixPath
+
+from neuropose.interfacer import VIDEO_EXTENSIONS
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Tunables
+# ---------------------------------------------------------------------------
+
+MAX_UNCOMPRESSED_BYTES: int = 20 * 1024 * 1024 * 1024  # 20 GB
+"""Maximum total uncompressed size of a single zip archive.
+
+Not a hard clinical limit — just a zip-bomb guard. A real research
+archive with a few dozen multi-GB recordings stays well below this
+number; anything above it is almost certainly a mistake or an attack.
+The cap is enforced by summing :attr:`zipfile.ZipInfo.file_size` over
+every candidate member *before* any extraction starts, so a bomb that
+expands to 100 GB is rejected without writing a single byte to disk.
+"""
+
+_JOB_NAME_SAFE_PATTERN = re.compile(r"[^A-Za-z0-9._-]+")
+"""Characters allowed in a derived job name. Everything else becomes ``_``."""
+
+
+# ---------------------------------------------------------------------------
+# Exceptions
+# ---------------------------------------------------------------------------
+
+
+class IngestError(Exception):
+    """Base class for errors raised by :func:`ingest_zip`."""
+
+
+class ArchiveInvalidError(IngestError):
+    """The zip archive is malformed, unreadable, or contains unsafe members."""
+
+
+class ArchiveEmptyError(IngestError):
+    """The archive contains no files with supported video extensions."""
+
+
+class ArchiveTooLargeError(IngestError):
+    """The archive's total uncompressed size exceeds :data:`MAX_UNCOMPRESSED_BYTES`."""
+
+
+class JobCollisionError(IngestError):
+    """One or more target job directories already exist, or two videos collapse to the same name.
+
+    The ``collisions`` attribute lists the colliding job names so the
+    caller can report them in one shot rather than piecemeal.
+    """
+
+    def __init__(self, message: str, collisions: list[str]) -> None:
+        super().__init__(message)
+        self.collisions = collisions
+
+
+# ---------------------------------------------------------------------------
+# Result container
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class IngestedJob:
+    """A single job directory created by :func:`ingest_zip`."""
+
+    job_name: str
+    job_dir: Path
+    video_filename: str
+    uncompressed_bytes: int
+
+
+@dataclass(frozen=True)
+class IngestResult:
+    """Aggregate result of :func:`ingest_zip`."""
+
+    archive: Path
+    ingested: list[IngestedJob] = field(default_factory=list)
+    skipped_non_videos: list[str] = field(default_factory=list)
+
+    @property
+    def job_count(self) -> int:
+        """Return the number of job directories created."""
+        return len(self.ingested)
+
+    @property
+    def total_uncompressed_bytes(self) -> int:
+        """Return the total uncompressed size of all ingested videos in bytes."""
+        return sum(job.uncompressed_bytes for job in self.ingested)
+
+
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+
+
+def ingest_zip(
+    archive_path: Path,
+    input_dir: Path,
+    *,
+    force: bool = False,
+) -> IngestResult:
+    """Extract videos from a zip archive into per-video job directories.
+
+    Parameters
+    ----------
+    archive_path
+        Path to the zip archive to ingest.
+    input_dir
+        The interfacer's input directory — typically
+        ``Settings.input_dir`` (``$data_dir/in``). Each video in the
+        archive produces a new subdirectory here, and the running
+        daemon picks them up on its next poll.
+    force
+        If ``True``, overwrite any pre-existing ``input_dir/<job_name>/``
+        directories that collide with the archive's contents by
+        removing them first. If ``False`` (default), raise
+        :class:`JobCollisionError` listing every colliding name and
+        perform no disk writes.
+
+    Returns
+    -------
+    IngestResult
+        Record of which jobs were created and which archive members
+        were skipped as non-video.
+
+    Raises
+    ------
+    FileNotFoundError
+        If ``archive_path`` does not exist.
+    ArchiveInvalidError
+        If the archive is not a valid zip, contains a member with an
+        absolute path or traversal components, or cannot be read.
+    ArchiveEmptyError
+        If the archive contains no files with supported video
+        extensions.
+    ArchiveTooLargeError
+        If the total uncompressed size of candidate video members
+        exceeds :data:`MAX_UNCOMPRESSED_BYTES`.
+    JobCollisionError
+        If two members of the archive flatten to the same job name,
+        or if any target ``input_dir/<job_name>/`` exists and
+        ``force`` is ``False``.
+    """
+    if not archive_path.exists():
+        raise FileNotFoundError(f"archive not found: {archive_path}")
+
+    input_dir.mkdir(parents=True, exist_ok=True)
+
+    # Phase 1: inspect the archive, decide every target job name, and
+    # catch all collisions before touching disk.
+    plan = _plan_ingest(archive_path, input_dir=input_dir, force=force)
+
+    # Phase 2: extract every video into a staging directory under the
+    # same filesystem as input_dir, so the final os.rename is atomic.
+    # Using a uuid in the staging directory name avoids collisions
+    # with concurrent ingests and with the daemon's lock file.
+    staging_root = input_dir.parent / f".ingest_{uuid.uuid4().hex}"
+    staging_root.mkdir(parents=True, exist_ok=False)
+
+    ingested: list[IngestedJob] = []
+    try:
+        with zipfile.ZipFile(archive_path, "r") as archive:
+            for entry in plan.entries:
+                staged_job_dir = staging_root / entry.job_name
+                staged_job_dir.mkdir(parents=True, exist_ok=False)
+                staged_video = staged_job_dir / entry.output_filename
+                with (
+                    archive.open(entry.member) as src,
+                    staged_video.open("wb") as dst,
+                ):
+                    shutil.copyfileobj(src, dst)
+
+        # Phase 3: move the staged job directories into place. The
+        # delete-then-rename dance under --force is only non-atomic
+        # per-job — each individual job still flips from "does not
+        # exist" to "fully populated" in one rename(2) call.
+        for entry in plan.entries:
+            staged_job_dir = staging_root / entry.job_name
+            target_job_dir = input_dir / entry.job_name
+            if force and target_job_dir.exists():
+                shutil.rmtree(target_job_dir)
+            staged_job_dir.rename(target_job_dir)
+            ingested.append(
+                IngestedJob(
+                    job_name=entry.job_name,
+                    job_dir=target_job_dir,
+                    video_filename=entry.output_filename,
+                    uncompressed_bytes=entry.member.file_size,
+                )
+            )
+    except Exception:
+        # Clean up whatever made it into input_dir so the caller is
+        # not left with a half-ingested state on, e.g., a zip that
+        # turns out to be truncated partway through extraction. We do
+        # NOT unwind under success; this branch is the failure path.
+        for job in ingested:
+            shutil.rmtree(job.job_dir, ignore_errors=True)
+        raise
+    finally:
+        shutil.rmtree(staging_root, ignore_errors=True)
+
+    logger.info(
+        "Ingested %d job(s) from %s (%d non-video members skipped)",
+        len(ingested),
+        archive_path,
+        len(plan.skipped_non_videos),
+    )
+    return IngestResult(
+        archive=archive_path,
+        ingested=ingested,
+        skipped_non_videos=plan.skipped_non_videos,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Private planning helpers
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class _PlannedEntry:
+    """One zip member bound for a specific job directory."""
+
+    member: zipfile.ZipInfo
+    job_name: str
+    output_filename: str
+
+
+@dataclass(frozen=True)
+class _IngestPlan:
+    """Result of :func:`_plan_ingest` — a concrete set of extractions."""
+
+    entries: list[_PlannedEntry]
+    skipped_non_videos: list[str]
+
+
+def _plan_ingest(
+    archive_path: Path,
+    *,
+    input_dir: Path,
+    force: bool,
+) -> _IngestPlan:
+    """Scan the archive and produce a validated extraction plan.
+
+    No disk is touched except for opening the zip read-only. Every
+    failure mode that we want to surface before extraction (bad archive,
+    traversal, empty, too big, collision) is decided in this function
+    so :func:`ingest_zip` is either a no-op or a complete success.
+    """
+    try:
+        archive = zipfile.ZipFile(archive_path, "r")
+    except zipfile.BadZipFile as exc:
+        raise ArchiveInvalidError(f"not a valid zip archive: {archive_path}") from exc
+
+    with archive:
+        try:
+            members = archive.infolist()
+        except Exception as exc:
+            raise ArchiveInvalidError(
+                f"could not read archive member list: {archive_path}"
+            ) from exc
+
+        entries: list[_PlannedEntry] = []
+        skipped: list[str] = []
+        job_name_to_member: dict[str, str] = {}
+        total_bytes = 0
+
+        for member in members:
+            if member.is_dir():
+                continue
+
+            member_name = member.filename
+            _check_member_path_safe(member_name)
+
+            # Filter by suffix. Non-video members are not errors — the
+            # zip can carry README.md, .DS_Store, etc. alongside the
+            # real payload.
+            if PurePosixPath(member_name).suffix.lower() not in VIDEO_EXTENSIONS:
+                skipped.append(member_name)
+                continue
+
+            total_bytes += member.file_size
+            if total_bytes > MAX_UNCOMPRESSED_BYTES:
+                raise ArchiveTooLargeError(
+                    f"uncompressed size exceeds {MAX_UNCOMPRESSED_BYTES} bytes "
+                    f"(cap) after including {member_name}; refusing to extract"
+                )
+
+            job_name = _derive_job_name(member_name)
+            output_filename = PurePosixPath(member_name).name
+
+            if job_name in job_name_to_member:
+                raise JobCollisionError(
+                    "zip contains two videos that flatten to the same job name "
+                    f"{job_name!r}: {job_name_to_member[job_name]!r} and {member_name!r}",
+                    collisions=[job_name],
+                )
+            job_name_to_member[job_name] = member_name
+
+            entries.append(
+                _PlannedEntry(
+                    member=member,
+                    job_name=job_name,
+                    output_filename=output_filename,
+                )
+            )
+
+    if not entries:
+        raise ArchiveEmptyError(
+            f"archive {archive_path} contains no video files with "
+            f"supported extensions ({sorted(VIDEO_EXTENSIONS)})"
+        )
+
+    # External collisions: a pre-existing job directory with the same
+    # name. Report every collision at once so the operator sees the
+    # full picture in one error, not one at a time.
+    if not force:
+        existing = [entry.job_name for entry in entries if (input_dir / entry.job_name).exists()]
+        if existing:
+            raise JobCollisionError(
+                f"{len(existing)} job director{'y' if len(existing) == 1 else 'ies'} "
+                f"already exist in {input_dir}; pass --force to overwrite",
+                collisions=existing,
+            )
+
+    return _IngestPlan(entries=entries, skipped_non_videos=skipped)
+
+
+def _check_member_path_safe(name: str) -> None:
+    """Refuse zip members with absolute paths or ``..`` traversal.
+
+    Python's :mod:`zipfile` does not raise on these by default on
+    3.11, so we validate each member name before it is extracted.
+    """
+    if not name:
+        raise ArchiveInvalidError("zip archive contains an empty member name")
+    if name.startswith("/") or (len(name) >= 2 and name[1] == ":"):
+        raise ArchiveInvalidError(f"zip archive contains an absolute-path member: {name!r}")
+    # PurePosixPath handles both forward and back slashes via the
+    # component-level check on ``..`` below.
+    parts = PurePosixPath(name).parts
+    if any(part == ".." for part in parts):
+        raise ArchiveInvalidError(
+            f"zip archive contains a traversal member ({name!r}); refusing to extract"
+        )
+
+
+def _derive_job_name(member_name: str) -> str:
+    """Convert an in-archive path into a safe job-directory name.
+
+    The full in-archive path (minus the final extension) is joined
+    with underscores — so ``patient_001/trial_01.mp4`` becomes
+    ``patient_001_trial_01`` and stays unambiguous even if a sibling
+    directory carries a file named ``trial_01.mp4``. Any character
+    outside ``[A-Za-z0-9._-]`` is replaced with an underscore, then
+    runs of underscores are collapsed, leading/trailing underscores
+    are stripped, and the result is non-empty by construction (a
+    pathological all-symbol name falls back to ``video``).
+    """
+    path = PurePosixPath(member_name)
+    stem_parts = [*path.parts[:-1], path.stem]
+    raw = "_".join(stem_parts)
+    sanitised = _JOB_NAME_SAFE_PATTERN.sub("_", raw)
+    # Collapse runs of underscores and strip leading/trailing ones so
+    # the result doesn't look like `_foo__bar_`.
+    collapsed = re.sub(r"_+", "_", sanitised).strip("_")
+    return collapsed or "video"
--- a/src/neuropose/interfacer.py
+++ b/src/neuropose/interfacer.py
@ -259,6 +259,15 @@ class Interfacer:

        Raises on any failure — the caller in :meth:`process_job` handles
        the transition to the failed state and the quarantine move.
+
+        During inference the interfacer updates the job's
+        :class:`JobStatusEntry` roughly every
+        :attr:`~neuropose.config.Settings.status_checkpoint_every_frames`
+        frames, so :mod:`neuropose.monitor` can render a live progress
+        bar for collaborators. Each checkpoint is a full atomic rewrite
+        of ``status.json`` via :func:`save_status`, which is
+        acceptable because scheduled pose inference is many orders of
+        magnitude more expensive than the write itself.
        """
        job_in_path = self._settings.input_dir / job_name
        job_out_path = self._settings.output_dir / job_name
@ -271,15 +280,58 @@ class Interfacer:
                f"(accepted extensions: {sorted(VIDEO_EXTENSIONS)})"
            )

+        videos_total = len(videos)
+        # Seed the initial progress checkpoint so the monitor shows
+        # "videos_total=N, videos_completed=0" from the first poll after
+        # the job starts, rather than waiting until a callback fires.
+        self._checkpoint_progress(
+            job_name,
+            started_at=started_at,
+            current_video=videos[0].name,
+            frames_processed=0,
+            frames_total=None,
+            videos_completed=0,
+            videos_total=videos_total,
+        )
+
        per_video_predictions = {}
-        for video_path in videos:
+        checkpoint_every = self._settings.status_checkpoint_every_frames
+        for video_index, video_path in enumerate(videos):
            if self._stop:
                raise JobProcessingError(
                    f"stop requested mid-job after processing "
-                    f"{len(per_video_predictions)}/{len(videos)} videos"
+                    f"{len(per_video_predictions)}/{videos_total} videos"
                )
            logger.info("[%s] Processing video %s", job_name, video_path.name)
-            result = self._estimator.process_video(video_path)
+
+            def _on_frame(
+                processed: int,
+                total_hint: int,
+                *,
+                # Bind the loop-local values so the closure captures
+                # them correctly for each iteration — without this the
+                # late-binding gotcha would make every callback report
+                # the last video's name once the loop advances.
+                _job_name: str = job_name,
+                _started_at: datetime = started_at,
+                _current_video: str = video_path.name,
+                _video_index: int = video_index,
+                _videos_total: int = videos_total,
+                _checkpoint_every: int = checkpoint_every,
+            ) -> None:
+                if processed % _checkpoint_every != 0:
+                    return
+                self._checkpoint_progress(
+                    _job_name,
+                    started_at=_started_at,
+                    current_video=_current_video,
+                    frames_processed=processed,
+                    frames_total=total_hint if total_hint > 0 else None,
+                    videos_completed=_video_index,
+                    videos_total=_videos_total,
+                )
+
+            result = self._estimator.process_video(video_path, progress=_on_frame)
            per_video_predictions[video_path.name] = result.predictions
            logger.info(
                "[%s] Processed %s (%d frames)",
@ -287,6 +339,19 @@ class Interfacer:
                video_path.name,
                result.frame_count,
            )
+            # Post-video checkpoint: snap videos_completed to the end of
+            # this video even if the last frame didn't fall on the
+            # checkpoint cadence, so the monitor's "N / M videos done"
+            # line is always exact after a video finishes.
+            self._checkpoint_progress(
+                job_name,
+                started_at=started_at,
+                current_video=video_path.name,
+                frames_processed=result.frame_count,
+                frames_total=result.frame_count,
+                videos_completed=video_index + 1,
+                videos_total=videos_total,
+            )

        job_results = JobResults(root=per_video_predictions)
        results_path = job_out_path / "results.json"
@ -298,8 +363,73 @@ class Interfacer:
            started_at=started_at,
            completed_at=datetime.now(UTC),
            results_path=results_path,
+            videos_completed=videos_total,
+            videos_total=videos_total,
+            percent_complete=100.0,
+            last_update=datetime.now(UTC),
        )

+    def _checkpoint_progress(
+        self,
+        job_name: str,
+        *,
+        started_at: datetime,
+        current_video: str,
+        frames_processed: int,
+        frames_total: int | None,
+        videos_completed: int,
+        videos_total: int,
+    ) -> None:
+        """Rewrite ``status.json`` with the current per-job progress.
+
+        Computes ``percent_complete`` across the whole job: videos that
+        have fully finished contribute ``1.0`` each, and the current
+        video contributes ``frames_processed / frames_total`` if the
+        frame-count hint is known, else a partial-credit estimate of
+        ``0.5``. The overall fraction is then averaged across
+        ``videos_total`` and scaled to 0-100.
+
+        Never raises on an I/O error — progress checkpoints are
+        best-effort. If the write fails we log and move on so the
+        inference loop keeps making forward progress.
+        """
+        if frames_total and frames_total > 0:
+            current_fraction = frames_processed / frames_total
+        elif frames_processed > 0:
+            current_fraction = 0.5
+        else:
+            current_fraction = 0.0
+        overall_fraction = (videos_completed + current_fraction) / max(videos_total, 1)
+        percent = max(0.0, min(100.0, overall_fraction * 100.0))
+
+        try:
+            status = load_status(self._settings.status_file)
+            existing = status.root.get(job_name)
+            if existing is None:
+                # The entry should have been seeded by process_job before
+                # _run_job_inner was called. If it isn't there, skip the
+                # checkpoint — something else has already deleted the
+                # entry and we do not want to recreate a ghost.
+                return
+            status.root[job_name] = existing.model_copy(
+                update={
+                    "current_video": current_video,
+                    "frames_processed": frames_processed,
+                    "frames_total": frames_total,
+                    "videos_completed": videos_completed,
+                    "videos_total": videos_total,
+                    "percent_complete": percent,
+                    "last_update": datetime.now(UTC),
+                }
+            )
+            save_status(self._settings.status_file, status)
+        except Exception:
+            logger.warning(
+                "[%s] Failed to checkpoint progress; continuing inference",
+                job_name,
+                exc_info=True,
+            )
+
    def _discover_new_jobs(self, status: StatusFile) -> list[str]:
        """Return names of job subdirectories not yet tracked in ``status``.

--- a/src/neuropose/io.py
+++ b/src/neuropose/io.py
@ -515,7 +515,24 @@ class JobResults(RootModel[dict[str, VideoPredictions]]):


 class JobStatusEntry(BaseModel):
-    """Status entry for a single job in the persistent status file."""
+    """Status entry for a single job in the persistent status file.
+
+    The progress fields (``current_video``, ``frames_processed``,
+    ``frames_total``, ``videos_completed``, ``videos_total``,
+    ``percent_complete``, ``last_update``) are populated by the
+    :class:`~neuropose.interfacer.Interfacer` as inference proceeds,
+    on a cadence driven by
+    :attr:`~neuropose.config.Settings.status_checkpoint_every_frames`.
+    They are optional so that legacy status files — written before the
+    monitor shipped — still validate on load, and so that any entry in
+    a terminal state (``completed`` / ``failed``) can leave them as
+    ``None`` without confusing the monitor.
+
+    ``percent_complete`` is the overall job progress in
+    ``[0.0, 100.0]``, computed across all videos in a multi-video job
+    (one video done + one partially done → fractional numerator). The
+    monitor renders it directly into the progress bar.
+    """

    model_config = ConfigDict(extra="forbid")

@ -530,6 +547,58 @@ class JobStatusEntry(BaseModel):
            "Populated by the interfacer on failure paths."
        ),
    )
+    current_video: str | None = Field(
+        default=None,
+        description=(
+            "Filename (basename, no directory) of the video currently "
+            "being processed within the job. ``None`` for jobs that "
+            "have not started yet or that are in a terminal state."
+        ),
+    )
+    frames_processed: int | None = Field(
+        default=None,
+        ge=0,
+        description="Frames processed in the current video.",
+    )
+    frames_total: int | None = Field(
+        default=None,
+        ge=0,
+        description=(
+            "OpenCV's ``CAP_PROP_FRAME_COUNT`` hint for the current "
+            "video. Unreliable for variable-rate videos; callers that "
+            "rely on this for ETA should treat it as an estimate."
+        ),
+    )
+    videos_completed: int | None = Field(
+        default=None,
+        ge=0,
+        description="Number of videos finished in this job.",
+    )
+    videos_total: int | None = Field(
+        default=None,
+        ge=0,
+        description="Total number of videos in this job.",
+    )
+    percent_complete: float | None = Field(
+        default=None,
+        ge=0.0,
+        le=100.0,
+        description=(
+            "Overall job progress in ``[0.0, 100.0]``, computed across "
+            "all videos in the job. ``None`` for entries that have no "
+            "progress data yet (recently-created or pre-monitor)."
+        ),
+    )
+    last_update: datetime | None = Field(
+        default=None,
+        description=(
+            "Wall-clock time of the most recent progress checkpoint. "
+            "Compared against ``datetime.now(UTC)`` by the monitor to "
+            "flag stale entries — a ``processing`` entry whose "
+            "``last_update`` is several minutes old may indicate a "
+            "wedged daemon."
+        ),
+    )


 class StatusFile(RootModel[dict[str, JobStatusEntry]]):
--- a/src/neuropose/monitor.py
+++ b/src/neuropose/monitor.py
@ -0,0 +1,447 @@
+"""Localhost HTTP status monitor for the NeuroPose daemon.
+
+Serves a small dashboard that collaborators on the same machine can
+visit in a browser to watch a batch run make progress. The page reads
+the persistent :class:`~neuropose.io.StatusFile` on every request — no
+new state, no in-memory cache, no sync protocol with the daemon — so a
+``neuropose serve`` process is trivially safe to run alongside
+``neuropose watch`` and stays useful even if the daemon is down.
+
+Two URLs are exposed:
+
+- ``GET /`` returns a plain HTML page with a progress bar per job,
+  an auto-refresh ``<meta>`` tag, and a stale-entry warning when a
+  processing job hasn't checkpointed in a while.
+- ``GET /status.json`` returns the raw validated
+  :class:`~neuropose.io.StatusFile` as JSON, so any collaborator with
+  ``curl`` (or a scripted pipeline) can consume the same data the
+  browser sees. ``?job=<name>`` filters to a single entry.
+- ``GET /health`` is a simple ``200 OK`` so external uptime
+  checks can tell that the server process is running without parsing
+  the HTML.
+
+By default the server binds to ``127.0.0.1:8765``. It is **not**
+exposed on any external interface by default — collaborators on the
+same machine reach it directly, and collaborators elsewhere should go
+through an SSH tunnel or explicitly configured reverse proxy. Binding
+to ``0.0.0.0`` requires an explicit ``--host`` override, because
+that is a real network-exposure decision the operator should make
+with eyes open.
+
+Dependencies
+------------
+Pure stdlib: :mod:`http.server`, :mod:`json`, :mod:`datetime`. No
+FastAPI, no Flask, no tornado — this is a localhost tool and the
+cost of a framework is not justified. Keeping it in stdlib also
+means the monitor has zero runtime dependency surface that could
+conflict with the rest of the project's pin.
+"""
+
+from __future__ import annotations
+
+import html
+import json
+import logging
+from datetime import UTC, datetime
+from http import HTTPStatus
+from http.server import BaseHTTPRequestHandler, HTTPServer
+from pathlib import Path
+from typing import Any
+from urllib.parse import parse_qs, urlsplit
+
+from neuropose.io import JobStatus, JobStatusEntry, StatusFile, load_status
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Tunables
+# ---------------------------------------------------------------------------
+
+DEFAULT_HOST = "127.0.0.1"
+DEFAULT_PORT = 8765
+HTML_REFRESH_SECONDS = 5
+"""How often the HTML page auto-refreshes via ``<meta http-equiv>``."""
+
+STALE_THRESHOLD_SECONDS = 60
+"""A ``processing`` entry with ``last_update`` older than this many
+seconds is flagged as stale in the HTML. 60 s is 20x the default
+checkpoint cadence at 10 fps inference, so anything beyond it is very
+likely a wedged daemon rather than normal jitter."""
+
+
+# ---------------------------------------------------------------------------
+# HTML rendering
+# ---------------------------------------------------------------------------
+
+
+_HTML_TEMPLATE = """\
+<!doctype html>
+<html lang="en">
+<head>
+<meta charset="utf-8">
+<meta http-equiv="refresh" content="{refresh}">
+<title>NeuroPose job status</title>
+<style>
+  body {{
+    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI",
+                 "Helvetica Neue", Arial, sans-serif;
+    max-width: 960px;
+    margin: 2em auto;
+    padding: 0 1em;
+    color: #1a1a1a;
+  }}
+  h1 {{ margin-bottom: 0.25em; }}
+  .subtitle {{ color: #555; margin-top: 0; font-size: 0.9em; }}
+  .empty {{
+    padding: 2em;
+    background: #f6f6f6;
+    border-radius: 8px;
+    text-align: center;
+    color: #555;
+  }}
+  table {{
+    width: 100%;
+    border-collapse: collapse;
+    margin-top: 1em;
+  }}
+  th, td {{
+    text-align: left;
+    padding: 0.6em 0.8em;
+    border-bottom: 1px solid #eee;
+    vertical-align: top;
+  }}
+  th {{
+    font-weight: 600;
+    background: #fafafa;
+    border-bottom: 2px solid #ccc;
+  }}
+  progress {{ width: 100%; height: 1em; }}
+  .status-processing {{ color: #1765bd; font-weight: 600; }}
+  .status-completed  {{ color: #2a8f3a; font-weight: 600; }}
+  .status-failed     {{ color: #b93232; font-weight: 600; }}
+  .stale {{
+    display: inline-block;
+    background: #fff3cd;
+    color: #8a6d1b;
+    padding: 0 0.4em;
+    border-radius: 4px;
+    font-size: 0.8em;
+    margin-left: 0.4em;
+  }}
+  .error-msg {{
+    font-family: ui-monospace, "SF Mono", Menlo, Consolas, monospace;
+    font-size: 0.85em;
+    color: #b93232;
+    white-space: pre-wrap;
+    word-break: break-word;
+  }}
+  .muted {{ color: #888; font-size: 0.85em; }}
+</style>
+</head>
+<body>
+<h1>NeuroPose job status</h1>
+<p class="subtitle">
+  Reading {status_path} &middot;
+  as of {now_iso} &middot;
+  auto-refresh every {refresh} s &middot;
+  <a href="/status.json">status.json</a>
+</p>
+{body}
+</body>
+</html>
+"""
+
+
+def render_status_html(status: StatusFile, *, status_path: Path, now: datetime) -> str:
+    """Render the full HTML dashboard for ``status``."""
+    if status.is_empty():
+        body = (
+            '<div class="empty">No jobs tracked yet. '
+            "Ingest a zip archive or drop a job directory into "
+            "<code>$data_dir/in/</code> to get started.</div>"
+        )
+    else:
+        rows = "\n".join(_render_row(name, entry, now=now) for name, entry in status.root.items())
+        body = (
+            "<table>\n"
+            "<thead><tr>"
+            "<th>Job</th>"
+            "<th>Status</th>"
+            "<th>Progress</th>"
+            "<th>Current video</th>"
+            "<th>Started</th>"
+            "<th>Last update</th>"
+            "</tr></thead>\n"
+            f"<tbody>\n{rows}\n</tbody>\n"
+            "</table>\n"
+        )
+    return _HTML_TEMPLATE.format(
+        refresh=HTML_REFRESH_SECONDS,
+        status_path=html.escape(str(status_path)),
+        now_iso=html.escape(now.isoformat(timespec="seconds")),
+        body=body,
+    )
+
+
+def _render_row(name: str, entry: JobStatusEntry, *, now: datetime) -> str:
+    """Render one ``<tr>`` for a single job entry."""
+    status_class = f"status-{entry.status.value}"
+    status_cell = f'<span class="{status_class}">{html.escape(entry.status.value)}</span>'
+
+    stale_badge = ""
+    if entry.status == JobStatus.PROCESSING and entry.last_update is not None:
+        age_seconds = (now - entry.last_update).total_seconds()
+        if age_seconds > STALE_THRESHOLD_SECONDS:
+            stale_badge = f'<span class="stale">stale — no update for {int(age_seconds)} s</span>'
+    status_cell += stale_badge
+
+    progress_cell = _render_progress_cell(entry)
+
+    current_video = html.escape(entry.current_video or "—")
+    if entry.videos_total is not None and entry.videos_completed is not None:
+        current_video += (
+            f' <span class="muted">({entry.videos_completed}/'
+            f"{entry.videos_total} videos done)</span>"
+        )
+
+    started_cell = html.escape(entry.started_at.isoformat(timespec="seconds"))
+    last_update_cell = (
+        html.escape(entry.last_update.isoformat(timespec="seconds"))
+        if entry.last_update is not None
+        else '<span class="muted">—</span>'
+    )
+
+    error_row = ""
+    if entry.status == JobStatus.FAILED and entry.error:
+        error_row = (
+            f'<tr><td colspan="6" class="error-msg">error: {html.escape(entry.error)}</td></tr>'
+        )
+
+    return (
+        f"<tr>"
+        f"<td>{html.escape(name)}</td>"
+        f"<td>{status_cell}</td>"
+        f"<td>{progress_cell}</td>"
+        f"<td>{current_video}</td>"
+        f"<td>{started_cell}</td>"
+        f"<td>{last_update_cell}</td>"
+        f"</tr>"
+        f"{error_row}"
+    )
+
+
+def _render_progress_cell(entry: JobStatusEntry) -> str:
+    """Render the progress-bar cell for one job."""
+    if entry.status == JobStatus.COMPLETED:
+        return '<progress value="100" max="100"></progress> 100%'
+    if entry.status == JobStatus.FAILED:
+        return '<span class="muted">—</span>'
+    if entry.percent_complete is None:
+        return '<progress max="100"></progress> <span class="muted">starting…</span>'
+    pct = entry.percent_complete
+    detail = ""
+    if entry.frames_processed is not None and entry.frames_total:
+        detail = (
+            f' <span class="muted">'
+            f"({entry.frames_processed}/{entry.frames_total} frames"
+            f"{_render_eta(entry)})</span>"
+        )
+    return f'<progress value="{pct:.1f}" max="100"></progress> {pct:.1f}%{detail}'
+
+
+def _render_eta(entry: JobStatusEntry) -> str:
+    """Return an ``, ETA ~XYZ s`` suffix if ETA can be computed."""
+    if entry.started_at is None or entry.percent_complete is None:
+        return ""
+    if entry.percent_complete <= 0.0 or entry.percent_complete >= 100.0:
+        return ""
+    now = datetime.now(UTC)
+    elapsed = (now - entry.started_at).total_seconds()
+    if elapsed <= 0.0:
+        return ""
+    fraction = entry.percent_complete / 100.0
+    total_estimated = elapsed / fraction
+    remaining = max(0.0, total_estimated - elapsed)
+    return f", ETA ~{_format_duration(remaining)}"
+
+
+def _format_duration(seconds: float) -> str:
+    """Format a duration in seconds as ``HH:MM:SS`` or ``MM:SS`` or ``SS s``."""
+    seconds_int = round(seconds)
+    if seconds_int < 60:
+        return f"{seconds_int} s"
+    if seconds_int < 3600:
+        m, s = divmod(seconds_int, 60)
+        return f"{m}:{s:02d}"
+    h, rem = divmod(seconds_int, 3600)
+    m, s = divmod(rem, 60)
+    return f"{h}:{m:02d}:{s:02d}"
+
+
+# ---------------------------------------------------------------------------
+# Request handler
+# ---------------------------------------------------------------------------
+
+
+class _StatusRequestHandler(BaseHTTPRequestHandler):
+    """HTTP handler that serves the status dashboard and JSON.
+
+    The status file path is injected via the server's ``status_path``
+    attribute (see :func:`build_server`). Using a subclass + instance
+    attribute avoids a module-level global and keeps the handler
+    trivially parametrisable for tests.
+    """
+
+    # Silence the stdlib server's default stderr access log; route it
+    # through the package logger instead so operators can tune it like
+    # any other neuropose module.
+    def log_message(self, format: str, *args: Any) -> None:
+        logger.debug("%s - - %s", self.address_string(), format % args)
+
+    def do_GET(self) -> None:
+        parsed = urlsplit(self.path)
+        path = parsed.path
+        if path == "/":
+            self._serve_html()
+        elif path == "/status.json":
+            self._serve_json(parse_qs(parsed.query))
+        elif path == "/health":
+            self._serve_health()
+        else:
+            self.send_error(HTTPStatus.NOT_FOUND, "unknown path")
+
+    def _serve_html(self) -> None:
+        status = self._load_status()
+        body = render_status_html(
+            status,
+            status_path=self.server.status_path,  # type: ignore[attr-defined]
+            now=datetime.now(UTC),
+        ).encode("utf-8")
+        self.send_response(HTTPStatus.OK)
+        self.send_header("Content-Type", "text/html; charset=utf-8")
+        self.send_header("Content-Length", str(len(body)))
+        self.send_header("Cache-Control", "no-store")
+        self.end_headers()
+        self.wfile.write(body)
+
+    def _serve_json(self, query: dict[str, list[str]]) -> None:
+        status = self._load_status()
+        payload: Any = status.model_dump(mode="json")
+        job_filter = query.get("job", [None])[0]
+        if job_filter is not None:
+            if job_filter not in status.root:
+                self.send_error(
+                    HTTPStatus.NOT_FOUND,
+                    f"no such job: {job_filter}",
+                )
+                return
+            payload = status.root[job_filter].model_dump(mode="json")
+        body = json.dumps(payload, indent=2).encode("utf-8")
+        self.send_response(HTTPStatus.OK)
+        self.send_header("Content-Type", "application/json; charset=utf-8")
+        self.send_header("Content-Length", str(len(body)))
+        self.send_header("Cache-Control", "no-store")
+        self.end_headers()
+        self.wfile.write(body)
+
+    def _serve_health(self) -> None:
+        body = b'{"status":"ok"}'
+        self.send_response(HTTPStatus.OK)
+        self.send_header("Content-Type", "application/json")
+        self.send_header("Content-Length", str(len(body)))
+        self.end_headers()
+        self.wfile.write(body)
+
+    def _load_status(self) -> StatusFile:
+        """Load the current status file.
+
+        The file is read on every request — no in-memory cache — which
+        is cheap in absolute terms (the file is small) and means the
+        monitor never serves stale data relative to the daemon's most
+        recent atomic write.
+        """
+        path: Path = self.server.status_path  # type: ignore[attr-defined]
+        return load_status(path)
+
+
+# ---------------------------------------------------------------------------
+# Server construction and lifecycle
+# ---------------------------------------------------------------------------
+
+
+class _StatusServer(HTTPServer):
+    """HTTPServer subclass that carries the status file path.
+
+    The path is needed inside every request handler, and the stdlib
+    server lets subclasses add arbitrary attributes — cleaner than a
+    module-level global or a closure-captured handler class.
+    """
+
+    status_path: Path
+
+    def __init__(
+        self,
+        server_address: tuple[str, int],
+        status_path: Path,
+    ) -> None:
+        super().__init__(server_address, _StatusRequestHandler)
+        self.status_path = status_path
+
+
+def build_server(
+    status_path: Path,
+    *,
+    host: str = DEFAULT_HOST,
+    port: int = DEFAULT_PORT,
+) -> _StatusServer:
+    """Construct (but do not start) a status monitor HTTP server.
+
+    Parameters
+    ----------
+    status_path
+        Path to the daemon's ``status.json``. Typically
+        ``Settings.status_file``.
+    host
+        Interface to bind. Defaults to ``127.0.0.1`` — explicitly
+        loopback so an unconfigured monitor cannot be reached over the
+        network without an operator opting in. Pass ``0.0.0.0`` or a
+        specific external IP only when you have thought through the
+        exposure.
+    port
+        TCP port to listen on. Defaults to 8765.
+
+    Returns
+    -------
+    _StatusServer
+        A ready-to-serve HTTP server. Call ``.serve_forever()`` to
+        block, or ``.handle_request()`` once for tests.
+    """
+    return _StatusServer((host, port), status_path)
+
+
+def serve_forever(
+    status_path: Path,
+    *,
+    host: str = DEFAULT_HOST,
+    port: int = DEFAULT_PORT,
+) -> None:
+    """Start the monitor and block until the process is interrupted.
+
+    Logs the bound address once on startup so operators can copy it
+    into a browser. Catches :class:`KeyboardInterrupt` and shuts the
+    server down cleanly so Ctrl-C produces a quick, no-traceback exit.
+    """
+    server = build_server(status_path, host=host, port=port)
+    logger.info(
+        "NeuroPose monitor listening on http://%s:%d/ (reading %s)",
+        host,
+        port,
+        status_path,
+    )
+    try:
+        server.serve_forever()
+    except KeyboardInterrupt:
+        logger.info("Interrupt received; shutting down monitor")
+    finally:
+        server.server_close()
--- a/tests/unit/test_cli.py
+++ b/tests/unit/test_cli.py
@ -92,6 +92,8 @@ class TestTopLevelOptions:
        assert result.exit_code in (EXIT_OK, EXIT_USAGE)
        assert "watch" in result.output
        assert "process" in result.output
+        assert "ingest" in result.output
+        assert "serve" in result.output
        assert "segment" in result.output
        assert "benchmark" in result.output
        assert "analyze" in result.output
@ -102,7 +104,15 @@ class TestTopLevelOptions:
        assert "NeuroPose" in result.output

    def test_subcommand_help(self, runner: CliRunner) -> None:
-        for subcommand in ("watch", "process", "segment", "benchmark", "analyze"):
+        for subcommand in (
+            "watch",
+            "process",
+            "ingest",
+            "serve",
+            "segment",
+            "benchmark",
+            "analyze",
+        ):
            result = runner.invoke(app, [subcommand, "--help"])
            assert result.exit_code == EXIT_OK, f"{subcommand} --help failed"

@ -215,6 +225,158 @@ class TestProcess:
        assert "commit 11" in result.output


+# ---------------------------------------------------------------------------
+# ingest
+# ---------------------------------------------------------------------------
+
+
+def _write_zip_for_cli(path: Path, members: dict[str, bytes]) -> Path:
+    """Build a zip fixture for the CLI ingest tests."""
+    import zipfile
+
+    with zipfile.ZipFile(path, "w") as z:
+        for name, data in members.items():
+            z.writestr(name, data)
+    return path
+
+
+class TestIngestSubcommand:
+    def test_ingest_happy_path_creates_job_dirs(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+        xdg_home: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(
+            tmp_path / "session.zip",
+            {"clip_01.mp4": b"video one", "clip_02.mp4": b"video two"},
+        )
+        result = runner.invoke(app, ["ingest", str(archive)])
+        assert result.exit_code == EXIT_OK, result.output
+        assert "ingested 2 job(s)" in result.output
+        # Default Settings.data_dir is $XDG_DATA_HOME/neuropose/jobs, and
+        # input_dir = data_dir/in.
+        input_dir = xdg_home / "neuropose" / "jobs" / "in"
+        assert (input_dir / "clip_01" / "clip_01.mp4").read_bytes() == b"video one"
+        assert (input_dir / "clip_02" / "clip_02.mp4").read_bytes() == b"video two"
+
+    def test_ingest_reports_skipped_non_videos(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(
+            tmp_path / "a.zip",
+            {"clip.mp4": b"v", "README.md": b"readme"},
+        )
+        result = runner.invoke(app, ["ingest", str(archive)])
+        assert result.exit_code == EXIT_OK, result.output
+        assert "1 non-video member(s) skipped" in result.output
+
+    def test_ingest_missing_archive_is_usage_error(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+    ) -> None:
+        result = runner.invoke(app, ["ingest", str(tmp_path / "nope.zip")])
+        assert result.exit_code == EXIT_USAGE
+
+    def test_ingest_empty_archive_is_usage_error(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(tmp_path / "empty.zip", {})
+        result = runner.invoke(app, ["ingest", str(archive)])
+        assert result.exit_code == EXIT_USAGE
+        assert "no video files" in result.output.lower()
+
+    def test_ingest_collision_without_force_is_usage_error(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+        xdg_home: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(tmp_path / "a.zip", {"clip.mp4": b"v"})
+        # Pre-create the colliding job directory so the first run has
+        # something to collide with.
+        runner.invoke(app, ["ingest", str(archive)])
+        result = runner.invoke(app, ["ingest", str(archive)])
+        assert result.exit_code == EXIT_USAGE
+        assert "already exist" in result.output.lower()
+        assert "--force" in result.output
+
+    def test_ingest_force_overwrites(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+        xdg_home: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(tmp_path / "a.zip", {"clip.mp4": b"original"})
+        first = runner.invoke(app, ["ingest", str(archive)])
+        assert first.exit_code == EXIT_OK, first.output
+        # Build a new archive with the same job name but different bytes.
+        archive2 = _write_zip_for_cli(tmp_path / "b.zip", {"clip.mp4": b"overwritten"})
+        second = runner.invoke(app, ["ingest", str(archive2), "--force"])
+        assert second.exit_code == EXIT_OK, second.output
+        input_dir = xdg_home / "neuropose" / "jobs" / "in"
+        assert (input_dir / "clip" / "clip.mp4").read_bytes() == b"overwritten"
+
+    def test_ingest_traversal_rejected_as_usage_error(
+        self,
+        runner: CliRunner,
+        tmp_path: Path,
+    ) -> None:
+        archive = _write_zip_for_cli(tmp_path / "a.zip", {"../escape.mp4": b"v"})
+        result = runner.invoke(app, ["ingest", str(archive)])
+        assert result.exit_code == EXIT_USAGE
+        assert "traversal" in result.output.lower()
+
+
+# ---------------------------------------------------------------------------
+# serve
+# ---------------------------------------------------------------------------
+
+
+class TestServeSubcommand:
+    def test_serve_exits_cleanly_on_keyboard_interrupt(
+        self,
+        runner: CliRunner,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """``neuropose serve`` should translate Ctrl-C into EXIT_INTERRUPTED.
+
+        We patch ``serve_forever`` to raise ``KeyboardInterrupt``
+        immediately, which simulates a Ctrl-C before any request is
+        served. The CLI's handler should map that to the standard
+        shell-interruption exit code.
+        """
+
+        def fake_serve_forever(status_path, *, host, port) -> None:
+            del status_path, host, port
+            raise KeyboardInterrupt
+
+        monkeypatch.setattr("neuropose.monitor.serve_forever", fake_serve_forever)
+        result = runner.invoke(app, ["serve"])
+        assert result.exit_code == EXIT_INTERRUPTED
+
+    def test_serve_bind_error_is_usage_error(
+        self,
+        runner: CliRunner,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """A port-already-in-use OSError should surface as EXIT_USAGE."""
+
+        def fake_serve_forever(status_path, *, host, port) -> None:
+            del status_path, host, port
+            raise OSError("address already in use")
+
+        monkeypatch.setattr("neuropose.monitor.serve_forever", fake_serve_forever)
+        result = runner.invoke(app, ["serve"])
+        assert result.exit_code == EXIT_USAGE
+        assert "could not bind" in result.output.lower()
+
+
 # ---------------------------------------------------------------------------
 # segment
 # ---------------------------------------------------------------------------
--- a/tests/unit/test_ingest.py
+++ b/tests/unit/test_ingest.py
@ -0,0 +1,292 @@
+"""Tests for :mod:`neuropose.ingest`.
+
+Coverage:
+
+- Happy path — nested and top-level videos produce one job each.
+- Job-name derivation — flattening, sanitization, collapsing.
+- Non-video members are skipped, not errors.
+- Zip-internal collisions (two videos → same job name) reported up
+  front.
+- External collisions (target job dir already exists) are listed in
+  one error; ``--force`` deletes and replaces.
+- Security: path-traversal and absolute-path members refused; empty
+  archive and oversize archive refused.
+- Atomicity: when extraction fails midway, no partial state is left
+  behind under ``input_dir``.
+"""
+
+from __future__ import annotations
+
+import zipfile
+from pathlib import Path
+
+import pytest
+
+from neuropose.ingest import (
+    ArchiveEmptyError,
+    ArchiveInvalidError,
+    ArchiveTooLargeError,
+    IngestResult,
+    JobCollisionError,
+    ingest_zip,
+)
+
+
+def _write_zip(path: Path, members: dict[str, bytes]) -> Path:
+    """Create a zip at ``path`` with the given ``{name: bytes}`` members."""
+    with zipfile.ZipFile(path, "w") as z:
+        for name, data in members.items():
+            z.writestr(name, data)
+    return path
+
+
+@pytest.fixture
+def input_dir(tmp_path: Path) -> Path:
+    """Return a fresh ``input_dir`` for the test."""
+    d = tmp_path / "jobs" / "in"
+    d.mkdir(parents=True)
+    return d
+
+
+# ---------------------------------------------------------------------------
+# Happy path
+# ---------------------------------------------------------------------------
+
+
+class TestHappyPath:
+    def test_top_level_video_becomes_job(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"clip_01.mp4": b"data"})
+        result = ingest_zip(archive, input_dir)
+        assert result.job_count == 1
+        assert result.ingested[0].job_name == "clip_01"
+        assert (input_dir / "clip_01" / "clip_01.mp4").read_bytes() == b"data"
+
+    def test_nested_path_flattens_into_job_name(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"patient_001/trial_01.mp4": b"vid"},
+        )
+        result = ingest_zip(archive, input_dir)
+        job = result.ingested[0]
+        assert job.job_name == "patient_001_trial_01"
+        # The video file inside the job dir keeps its basename, not the
+        # flattened job name, so the daemon sees a clean filename.
+        assert job.video_filename == "trial_01.mp4"
+        assert (input_dir / "patient_001_trial_01" / "trial_01.mp4").exists()
+
+    def test_sibling_nested_videos_do_not_collide(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {
+                "patient_001/trial_01.mp4": b"a",
+                "patient_002/trial_01.mp4": b"b",
+            },
+        )
+        result = ingest_zip(archive, input_dir)
+        names = {j.job_name for j in result.ingested}
+        assert names == {"patient_001_trial_01", "patient_002_trial_01"}
+
+    def test_multiple_videos_produce_multiple_jobs(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {f"clip_{i:02d}.mp4": f"d{i}".encode() for i in range(5)},
+        )
+        result = ingest_zip(archive, input_dir)
+        assert result.job_count == 5
+        assert len(list(input_dir.iterdir())) == 5
+
+    def test_non_video_members_skipped(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {
+                "clip.mp4": b"video",
+                "README.md": b"notes",
+                ".DS_Store": b"junk",
+                "notes.txt": b"notes",
+            },
+        )
+        result = ingest_zip(archive, input_dir)
+        assert result.job_count == 1
+        assert sorted(result.skipped_non_videos) == sorted([".DS_Store", "README.md", "notes.txt"])
+
+    def test_all_accepted_extensions(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {
+                "a.mp4": b"a",
+                "b.avi": b"b",
+                "c.mov": b"c",
+                "d.mkv": b"d",
+            },
+        )
+        result = ingest_zip(archive, input_dir)
+        assert result.job_count == 4
+
+    def test_returns_typed_result(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"data"})
+        result = ingest_zip(archive, input_dir)
+        assert isinstance(result, IngestResult)
+        assert result.total_uncompressed_bytes == 4
+
+
+# ---------------------------------------------------------------------------
+# Job-name sanitization
+# ---------------------------------------------------------------------------
+
+
+class TestJobNameDerivation:
+    def test_special_chars_become_underscores(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"session 2026-04-15 / trial @1.mp4": b"v"},
+        )
+        result = ingest_zip(archive, input_dir)
+        name = result.ingested[0].job_name
+        # Every character ends up in the safe set; runs of underscores
+        # are collapsed and leading/trailing stripped.
+        assert name == "session_2026-04-15_trial_1"
+
+    def test_all_symbol_name_falls_back_to_video(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"!!!.mp4": b"v"})
+        result = ingest_zip(archive, input_dir)
+        assert result.ingested[0].job_name == "video"
+
+
+# ---------------------------------------------------------------------------
+# Collision detection
+# ---------------------------------------------------------------------------
+
+
+class TestCollisions:
+    def test_zip_internal_collision_rejects(self, tmp_path: Path, input_dir: Path) -> None:
+        # Both entries flatten to the same job name because their
+        # stems are the same and both are top-level after derivation.
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"a/b.mp4": b"x", "a/b.mp4.bak": b"y"},
+        )
+        # The second one is non-video (.bak suffix), so this is
+        # actually a happy case. Build a real collision:
+        archive = _write_zip(
+            tmp_path / "b.zip",
+            {"x__y.mp4": b"1", "x y.mp4": b"2"},
+        )
+        with pytest.raises(JobCollisionError):
+            ingest_zip(archive, input_dir)
+        # No files written.
+        assert list(input_dir.iterdir()) == []
+
+    def test_external_collision_without_force(self, tmp_path: Path, input_dir: Path) -> None:
+        (input_dir / "clip").mkdir()
+        (input_dir / "clip" / "existing.mp4").write_bytes(b"old")
+        archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"new"})
+        with pytest.raises(JobCollisionError) as excinfo:
+            ingest_zip(archive, input_dir)
+        assert excinfo.value.collisions == ["clip"]
+        # Existing job dir is untouched.
+        assert (input_dir / "clip" / "existing.mp4").read_bytes() == b"old"
+
+    def test_external_collision_listed_together(self, tmp_path: Path, input_dir: Path) -> None:
+        for name in ("a", "b", "c"):
+            (input_dir / name).mkdir()
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"a.mp4": b"1", "b.mp4": b"2", "c.mp4": b"3", "d.mp4": b"4"},
+        )
+        with pytest.raises(JobCollisionError) as excinfo:
+            ingest_zip(archive, input_dir)
+        assert sorted(excinfo.value.collisions) == ["a", "b", "c"]
+
+    def test_force_overwrites_existing(self, tmp_path: Path, input_dir: Path) -> None:
+        (input_dir / "clip").mkdir()
+        (input_dir / "clip" / "existing.mp4").write_bytes(b"old")
+        archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"new"})
+        result = ingest_zip(archive, input_dir, force=True)
+        assert result.job_count == 1
+        # The old file is gone; only the new one remains.
+        files = list((input_dir / "clip").iterdir())
+        assert [f.name for f in files] == ["clip.mp4"]
+        assert (input_dir / "clip" / "clip.mp4").read_bytes() == b"new"
+
+
+# ---------------------------------------------------------------------------
+# Security
+# ---------------------------------------------------------------------------
+
+
+class TestSecurity:
+    def test_absolute_path_member_rejected(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"/etc/passwd.mp4": b"x"})
+        with pytest.raises(ArchiveInvalidError, match="absolute"):
+            ingest_zip(archive, input_dir)
+        assert list(input_dir.iterdir()) == []
+
+    def test_traversal_member_rejected(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"../escape.mp4": b"x"})
+        with pytest.raises(ArchiveInvalidError, match="traversal"):
+            ingest_zip(archive, input_dir)
+        assert list(input_dir.iterdir()) == []
+
+    def test_empty_archive_rejected(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {})
+        with pytest.raises(ArchiveEmptyError):
+            ingest_zip(archive, input_dir)
+
+    def test_archive_with_only_non_videos_rejected(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"README.md": b"no videos here", "notes.txt": b"none"},
+        )
+        with pytest.raises(ArchiveEmptyError):
+            ingest_zip(archive, input_dir)
+
+    def test_too_large_archive_rejected(
+        self,
+        tmp_path: Path,
+        input_dir: Path,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        # Lower the cap for the test rather than building a real
+        # multi-GB zip. The enforcement path is the same.
+        monkeypatch.setattr("neuropose.ingest.MAX_UNCOMPRESSED_BYTES", 10)
+        archive = _write_zip(
+            tmp_path / "a.zip",
+            {"clip.mp4": b"0123456789ABCDEF"},  # 16 bytes > 10
+        )
+        with pytest.raises(ArchiveTooLargeError):
+            ingest_zip(archive, input_dir)
+        assert list(input_dir.iterdir()) == []
+
+    def test_bad_zip_file_rejected(self, tmp_path: Path, input_dir: Path) -> None:
+        bad = tmp_path / "bad.zip"
+        bad.write_bytes(b"this is not a valid zip file at all")
+        with pytest.raises(ArchiveInvalidError):
+            ingest_zip(bad, input_dir)
+
+    def test_missing_archive_raises(self, tmp_path: Path, input_dir: Path) -> None:
+        with pytest.raises(FileNotFoundError):
+            ingest_zip(tmp_path / "nope.zip", input_dir)
+
+
+# ---------------------------------------------------------------------------
+# Atomicity
+# ---------------------------------------------------------------------------
+
+
+class TestAtomicity:
+    def test_staging_directory_cleaned_up_on_success(self, tmp_path: Path, input_dir: Path) -> None:
+        archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"v"})
+        ingest_zip(archive, input_dir)
+        # No stray `.ingest_*` directories left under the parent.
+        leftover = [p for p in input_dir.parent.iterdir() if p.name.startswith(".ingest_")]
+        assert leftover == []
+
+    def test_no_partial_state_when_planning_fails(self, tmp_path: Path, input_dir: Path) -> None:
+        # An archive that will pass the zipfile open but fail at
+        # planning (traversal member) should never write to input_dir.
+        archive = _write_zip(tmp_path / "a.zip", {"../bad.mp4": b"v"})
+        with pytest.raises(ArchiveInvalidError):
+            ingest_zip(archive, input_dir)
+        assert list(input_dir.iterdir()) == []
+        leftover = [p for p in input_dir.parent.iterdir() if p.name.startswith(".ingest_")]
+        assert leftover == []
--- a/tests/unit/test_interfacer.py
+++ b/tests/unit/test_interfacer.py
@ -45,8 +45,14 @@ class _RaisingEstimator:
    def load_model(self, cache_dir: Path | None = None) -> None:
        del cache_dir

-    def process_video(self, video_path: Path) -> Any:
-        del video_path
+    def process_video(
+        self,
+        video_path: Path,
+        *,
+        progress: Any = None,
+        **_: Any,
+    ) -> Any:
+        del video_path, progress
        raise self._exc


@ -249,6 +255,142 @@ class TestProcessJobSuccess:
        assert not (settings.failed_dir / "job_a").exists()


+class TestProgressCheckpointing:
+    def test_completed_entry_has_full_progress(
+        self,
+        tmp_path: Path,
+        synthetic_video: Path,
+        fake_metrabs_model,
+    ) -> None:
+        settings = _make_settings(tmp_path)
+        _prepare_job(settings, "job_a", videos=[synthetic_video])
+        interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
+
+        entry = interfacer.process_job("job_a")
+
+        assert entry.percent_complete == 100.0
+        assert entry.videos_completed == 1
+        assert entry.videos_total == 1
+        assert entry.last_update is not None
+
+    def test_checkpoints_persist_to_status_file(
+        self,
+        tmp_path: Path,
+        synthetic_video: Path,
+        fake_metrabs_model,
+    ) -> None:
+        # Force a low checkpoint cadence so the 5-frame synthetic video
+        # actually triggers the callback a few times. Without this the
+        # default cadence of 30 would not fire on a 5-frame video.
+        settings = Settings(
+            data_dir=tmp_path / "jobs",
+            model_cache_dir=tmp_path / "models",
+            status_checkpoint_every_frames=1,
+        )
+        _prepare_job(settings, "job_a", videos=[synthetic_video])
+        interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
+
+        interfacer.process_job("job_a")
+
+        status = load_status(settings.status_file)
+        entry = status.root["job_a"]
+        # Final entry is COMPLETED with full progress.
+        assert entry.status == JobStatus.COMPLETED
+        assert entry.percent_complete == 100.0
+
+    def test_seed_checkpoint_sets_videos_total_before_first_callback(
+        self,
+        tmp_path: Path,
+        synthetic_video: Path,
+        fake_metrabs_model,
+    ) -> None:
+        """The seeded checkpoint should land even if no frame callbacks fire.
+
+        We wrap the real estimator so its ``process_video`` records
+        the ``status.json`` state at the moment it is called — after
+        the seed checkpoint but before any frame-based callbacks run —
+        and assert that ``videos_total`` and ``current_video`` were
+        already written by the seed.
+        """
+        settings = _make_settings(tmp_path)
+        _prepare_job(settings, "job_a", videos=[synthetic_video])
+
+        seen_entries: list[JobStatusEntry] = []
+        real_estimator = Estimator(model=fake_metrabs_model)
+
+        class RecordingEstimator:
+            is_model_loaded = True
+
+            def load_model(self, cache_dir: Path | None = None) -> None:
+                del cache_dir
+
+            def process_video(
+                self,
+                video_path: Path,
+                *,
+                progress: Any = None,
+                **_: Any,
+            ) -> Any:
+                status = load_status(settings.status_file)
+                seen_entries.append(status.root["job_a"])
+                return real_estimator.process_video(video_path, progress=progress)
+
+        interfacer = Interfacer(settings, RecordingEstimator())  # type: ignore[arg-type]
+        interfacer.process_job("job_a")
+
+        assert len(seen_entries) == 1
+        seeded = seen_entries[0]
+        assert seeded.videos_total == 1
+        assert seeded.videos_completed == 0
+        assert seeded.current_video == synthetic_video.name
+
+    def test_checkpoint_progress_swallows_io_errors(
+        self,
+        tmp_path: Path,
+        fake_metrabs_model,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """``_checkpoint_progress`` is best-effort — must not raise.
+
+        We call it directly and patch the underlying ``save_status``
+        to raise, then assert the call returns normally. Testing the
+        helper in isolation is more robust than trying to inject a
+        failure at the exact call-ordinal during a full ``process_job``
+        run.
+        """
+        settings = _make_settings(tmp_path)
+        settings.ensure_dirs()
+        # Seed a PROCESSING entry so _checkpoint_progress has something
+        # to update.
+        status = StatusFile(
+            root={
+                "job_a": JobStatusEntry(
+                    status=JobStatus.PROCESSING,
+                    started_at=datetime.now(UTC),
+                )
+            }
+        )
+        save_status(settings.status_file, status)
+        interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
+
+        def broken_save(path: Path, status: StatusFile) -> None:
+            raise OSError("disk full, simulated")
+
+        monkeypatch.setattr("neuropose.interfacer.save_status", broken_save)
+
+        # The helper must not raise — the caller (inference loop)
+        # continues making forward progress even if the write fails.
+        interfacer._checkpoint_progress(
+            "job_a",
+            started_at=datetime.now(UTC),
+            current_video="v.mp4",
+            frames_processed=10,
+            frames_total=100,
+            videos_completed=0,
+            videos_total=1,
+        )
+
+
 # ---------------------------------------------------------------------------
 # process_job failure paths
 # ---------------------------------------------------------------------------
--- a/tests/unit/test_io.py
+++ b/tests/unit/test_io.py
@ -16,6 +16,7 @@ from neuropose.io import (
    FramePrediction,
    JobResults,
    JobStatus,
+    JobStatusEntry,
    JointAngleExtractor,
    JointAxisExtractor,
    JointPairDistanceExtractor,
@ -525,3 +526,86 @@ class TestStatusFile:
                    }
                }
            )
+
+
+class TestJobStatusEntryProgressFields:
+    def test_progress_fields_default_to_none(self) -> None:
+        entry = JobStatusEntry(
+            status=JobStatus.PROCESSING,
+            started_at=datetime(2026, 4, 13, tzinfo=UTC),
+        )
+        assert entry.current_video is None
+        assert entry.frames_processed is None
+        assert entry.frames_total is None
+        assert entry.videos_completed is None
+        assert entry.videos_total is None
+        assert entry.percent_complete is None
+        assert entry.last_update is None
+
+    def test_legacy_status_file_without_progress_loads(self, tmp_path: Path) -> None:
+        """Files written before the progress fields existed must still load."""
+        started = datetime(2026, 4, 13, 12, 0, 0, tzinfo=UTC)
+        path = tmp_path / "legacy.json"
+        path.write_text(
+            json.dumps(
+                {
+                    "job_001": {
+                        "status": "completed",
+                        "started_at": started.isoformat(),
+                        "completed_at": started.isoformat(),
+                        "results_path": "/tmp/results.json",
+                        "error": None,
+                    }
+                }
+            )
+        )
+        loaded = load_status(path)
+        entry = loaded.root["job_001"]
+        assert entry.percent_complete is None
+
+    def test_progress_roundtrips_through_json(self, tmp_path: Path) -> None:
+        now = datetime(2026, 4, 13, 12, 0, 0, tzinfo=UTC)
+        status = StatusFile(
+            root={
+                "job_001": JobStatusEntry(
+                    status=JobStatus.PROCESSING,
+                    started_at=now,
+                    current_video="trial_01.mp4",
+                    frames_processed=450,
+                    frames_total=1200,
+                    videos_completed=0,
+                    videos_total=3,
+                    percent_complete=12.5,
+                    last_update=now,
+                ),
+            }
+        )
+        path = tmp_path / "status.json"
+        save_status(path, status)
+        loaded = load_status(path)
+        entry = loaded.root["job_001"]
+        assert entry.current_video == "trial_01.mp4"
+        assert entry.frames_processed == 450
+        assert entry.percent_complete == 12.5
+
+    def test_percent_complete_rejects_out_of_range(self) -> None:
+        with pytest.raises(ValidationError):
+            JobStatusEntry(
+                status=JobStatus.PROCESSING,
+                started_at=datetime(2026, 4, 13, tzinfo=UTC),
+                percent_complete=150.0,
+            )
+        with pytest.raises(ValidationError):
+            JobStatusEntry(
+                status=JobStatus.PROCESSING,
+                started_at=datetime(2026, 4, 13, tzinfo=UTC),
+                percent_complete=-5.0,
+            )
+
+    def test_frames_processed_rejects_negative(self) -> None:
+        with pytest.raises(ValidationError):
+            JobStatusEntry(
+                status=JobStatus.PROCESSING,
+                started_at=datetime(2026, 4, 13, tzinfo=UTC),
+                frames_processed=-1,
+            )
--- a/tests/unit/test_monitor.py
+++ b/tests/unit/test_monitor.py
@ -0,0 +1,346 @@
+"""Tests for :mod:`neuropose.monitor`.
+
+The tests boot the real :class:`http.server.HTTPServer` subclass on a
+free ephemeral port in a background thread, issue real HTTP requests
+with :mod:`urllib.request`, and assert on the responses. This
+exercises the handler, routing, JSON serialization, HTML rendering,
+and the query-parameter filter end-to-end — exactly the surface
+collaborators will hit.
+"""
+
+from __future__ import annotations
+
+import json
+import socket
+import threading
+import time
+import urllib.error
+import urllib.request
+from datetime import UTC, datetime, timedelta
+from http.server import HTTPServer
+from pathlib import Path
+
+import pytest
+
+from neuropose.io import (
+    JobStatus,
+    JobStatusEntry,
+    StatusFile,
+    save_status,
+)
+from neuropose.monitor import (
+    STALE_THRESHOLD_SECONDS,
+    build_server,
+    render_status_html,
+    serve_forever,
+)
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _free_port() -> int:
+    """Grab an unused TCP port and release it for the next caller.
+
+    ``HTTPServer`` does not offer a first-class "bind to an ephemeral
+    port" API that also exposes the chosen port to the caller before
+    serving. Doing a separate SO_REUSEADDR probe to pick one is
+    reliable enough for a test-only fixture.
+    """
+    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+        s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        s.bind(("127.0.0.1", 0))
+        return int(s.getsockname()[1])
+
+
+@pytest.fixture
+def status_path(tmp_path: Path) -> Path:
+    """Return a path to a populated status.json for the monitor tests."""
+    path = tmp_path / "status.json"
+    now = datetime.now(UTC)
+    status = StatusFile(
+        root={
+            "job_running": JobStatusEntry(
+                status=JobStatus.PROCESSING,
+                started_at=now - timedelta(minutes=2),
+                current_video="trial_01.mp4",
+                frames_processed=450,
+                frames_total=1200,
+                videos_completed=0,
+                videos_total=3,
+                percent_complete=12.5,
+                last_update=now,
+            ),
+            "job_done": JobStatusEntry(
+                status=JobStatus.COMPLETED,
+                started_at=now - timedelta(minutes=10),
+                completed_at=now - timedelta(minutes=5),
+                results_path=Path("/tmp/out/job_done/results.json"),
+                videos_completed=2,
+                videos_total=2,
+                percent_complete=100.0,
+                last_update=now - timedelta(minutes=5),
+            ),
+            "job_dead": JobStatusEntry(
+                status=JobStatus.FAILED,
+                started_at=now - timedelta(minutes=6),
+                completed_at=now - timedelta(minutes=5),
+                error="VideoDecodeError: OpenCV could not open video",
+            ),
+        }
+    )
+    save_status(path, status)
+    return path
+
+
+@pytest.fixture
+def running_server(status_path: Path):
+    """Boot the monitor in a background thread, yield the base URL."""
+    port = _free_port()
+    server = build_server(status_path, host="127.0.0.1", port=port)
+    thread = threading.Thread(target=server.serve_forever, daemon=True)
+    thread.start()
+    # Tiny settle delay so the first request doesn't race with
+    # serve_forever's setup. The stdlib server is synchronous enough
+    # that a few ms is plenty.
+    time.sleep(0.05)
+    try:
+        yield f"http://127.0.0.1:{port}"
+    finally:
+        server.shutdown()
+        server.server_close()
+        thread.join(timeout=2)
+
+
+def _get(url: str, *, timeout: float = 2.0) -> tuple[int, bytes, dict[str, str]]:
+    try:
+        with urllib.request.urlopen(url, timeout=timeout) as resp:
+            return resp.status, resp.read(), dict(resp.headers)
+    except urllib.error.HTTPError as exc:
+        return exc.code, exc.read(), dict(exc.headers) if exc.headers else {}
+
+
+# ---------------------------------------------------------------------------
+# HTML rendering (pure function, no server)
+# ---------------------------------------------------------------------------
+
+
+class TestRenderStatusHtml:
+    def test_empty_status_shows_empty_state(self, tmp_path: Path) -> None:
+        html_text = render_status_html(
+            StatusFile(root={}),
+            status_path=tmp_path / "status.json",
+            now=datetime.now(UTC),
+        )
+        assert "No jobs tracked yet" in html_text
+        assert "auto-refresh" in html_text
+
+    def test_processing_entry_renders_progress_bar(self, status_path: Path) -> None:
+        from neuropose.io import load_status
+
+        status = load_status(status_path)
+        html_text = render_status_html(
+            status,
+            status_path=status_path,
+            now=datetime.now(UTC),
+        )
+        assert "job_running" in html_text
+        assert "<progress" in html_text
+        assert "12.5" in html_text
+        assert "trial_01.mp4" in html_text
+
+    def test_completed_entry_shows_100_percent(self, status_path: Path) -> None:
+        from neuropose.io import load_status
+
+        status = load_status(status_path)
+        html_text = render_status_html(
+            status,
+            status_path=status_path,
+            now=datetime.now(UTC),
+        )
+        assert "job_done" in html_text
+        assert "100" in html_text
+
+    def test_failed_entry_shows_error_message(self, status_path: Path) -> None:
+        from neuropose.io import load_status
+
+        status = load_status(status_path)
+        html_text = render_status_html(
+            status,
+            status_path=status_path,
+            now=datetime.now(UTC),
+        )
+        assert "job_dead" in html_text
+        assert "VideoDecodeError" in html_text
+
+    def test_error_message_is_html_escaped(self, tmp_path: Path) -> None:
+        now = datetime.now(UTC)
+        status = StatusFile(
+            root={
+                "x": JobStatusEntry(
+                    status=JobStatus.FAILED,
+                    started_at=now,
+                    completed_at=now,
+                    error="<script>alert('xss')</script>",
+                )
+            }
+        )
+        html_text = render_status_html(
+            status,
+            status_path=tmp_path / "status.json",
+            now=now,
+        )
+        assert "<script>" not in html_text
+        assert "&lt;script&gt;" in html_text
+
+    def test_stale_processing_entry_gets_badge(self, tmp_path: Path) -> None:
+        now = datetime.now(UTC)
+        stale_update = now - timedelta(seconds=STALE_THRESHOLD_SECONDS + 30)
+        status = StatusFile(
+            root={
+                "stuck": JobStatusEntry(
+                    status=JobStatus.PROCESSING,
+                    started_at=now - timedelta(hours=1),
+                    current_video="video.mp4",
+                    frames_processed=10,
+                    frames_total=100,
+                    videos_completed=0,
+                    videos_total=1,
+                    percent_complete=10.0,
+                    last_update=stale_update,
+                ),
+            }
+        )
+        html_text = render_status_html(status, status_path=tmp_path / "status.json", now=now)
+        # The badge text is "stale — no update for X s". The class
+        # 'stale' also appears in the inline <style> block, so we
+        # match the human-readable badge text to distinguish.
+        assert "no update for" in html_text
+
+    def test_fresh_processing_entry_has_no_stale_badge(self, tmp_path: Path) -> None:
+        now = datetime.now(UTC)
+        status = StatusFile(
+            root={
+                "ok": JobStatusEntry(
+                    status=JobStatus.PROCESSING,
+                    started_at=now - timedelta(seconds=5),
+                    current_video="x.mp4",
+                    frames_processed=5,
+                    frames_total=100,
+                    videos_completed=0,
+                    videos_total=1,
+                    percent_complete=5.0,
+                    last_update=now,
+                ),
+            }
+        )
+        html_text = render_status_html(status, status_path=tmp_path / "status.json", now=now)
+        # Same logic: the CSS class is always present, so we match on
+        # the badge text which is only injected when the entry is
+        # actually stale.
+        assert "no update for" not in html_text
+
+
+# ---------------------------------------------------------------------------
+# HTTP server
+# ---------------------------------------------------------------------------
+
+
+class TestHtmlRoute:
+    def test_root_returns_html(self, running_server: str) -> None:
+        status, body, headers = _get(f"{running_server}/")
+        assert status == 200
+        assert "text/html" in headers.get("Content-Type", "")
+        text = body.decode()
+        assert "NeuroPose job status" in text
+        assert "job_running" in text
+        assert "<progress" in text
+
+    def test_root_has_no_store_cache_control(self, running_server: str) -> None:
+        _, _, headers = _get(f"{running_server}/")
+        assert "no-store" in headers.get("Cache-Control", "")
+
+
+class TestJsonRoute:
+    def test_status_json_returns_all_entries(self, running_server: str) -> None:
+        status, body, headers = _get(f"{running_server}/status.json")
+        assert status == 200
+        assert "application/json" in headers.get("Content-Type", "")
+        data = json.loads(body)
+        assert set(data.keys()) == {"job_running", "job_done", "job_dead"}
+
+    def test_status_json_filter_returns_single_entry(self, running_server: str) -> None:
+        status, body, _ = _get(f"{running_server}/status.json?job=job_running")
+        assert status == 200
+        data = json.loads(body)
+        assert data["status"] == "processing"
+        assert data["current_video"] == "trial_01.mp4"
+        assert data["percent_complete"] == 12.5
+
+    def test_status_json_filter_unknown_job_is_404(self, running_server: str) -> None:
+        status, _, _ = _get(f"{running_server}/status.json?job=nope")
+        assert status == 404
+
+
+class TestHealthRoute:
+    def test_health_returns_ok(self, running_server: str) -> None:
+        status, body, _ = _get(f"{running_server}/health")
+        assert status == 200
+        assert json.loads(body) == {"status": "ok"}
+
+
+class TestUnknownRoutes:
+    def test_unknown_path_is_404(self, running_server: str) -> None:
+        status, _, _ = _get(f"{running_server}/wat")
+        assert status == 404
+
+
+# ---------------------------------------------------------------------------
+# Monitor survives a missing status file
+# ---------------------------------------------------------------------------
+
+
+class TestMissingStatusFile:
+    def test_monitor_reads_missing_file_as_empty(self, tmp_path: Path) -> None:
+        port = _free_port()
+        path = tmp_path / "status.json"  # deliberately not created
+        server = build_server(path, host="127.0.0.1", port=port)
+        thread = threading.Thread(target=server.serve_forever, daemon=True)
+        thread.start()
+        time.sleep(0.05)
+        try:
+            status, body, _ = _get(f"http://127.0.0.1:{port}/status.json")
+            assert status == 200
+            assert json.loads(body) == {}
+        finally:
+            server.shutdown()
+            server.server_close()
+            thread.join(timeout=2)
+
+
+# ---------------------------------------------------------------------------
+# serve_forever wrapper
+# ---------------------------------------------------------------------------
+
+
+class TestServeForever:
+    def test_serve_forever_shuts_down_on_keyboard_interrupt(
+        self,
+        status_path: Path,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """``serve_forever`` should catch KeyboardInterrupt and clean up.
+
+        We monkeypatch the stdlib HTTPServer's ``serve_forever`` to
+        immediately raise, which simulates a Ctrl-C before any request
+        is accepted. The wrapper should swallow it, log, and close the
+        server.
+        """
+
+        def fake_serve_forever(self) -> None:
+            raise KeyboardInterrupt
+
+        monkeypatch.setattr(HTTPServer, "serve_forever", fake_serve_forever)
+        # Should return cleanly, not propagate the KeyboardInterrupt.
+        serve_forever(status_path, host="127.0.0.1", port=_free_port())