This commit is contained in:
Levi Neuwirth 2026-04-15 11:41:27 -04:00
parent d29f4f1b78
commit 8e1c8833f5
19 changed files with 2566 additions and 13 deletions

14
.gitignore vendored
View File

@ -85,6 +85,20 @@ site/
# Allow committing example videos under tests/fixtures/ explicitly.
!tests/fixtures/**
# --- Benchmark scratch directory -------------------------------------------
# ``benchmarks/videos/`` is a workflow convenience: drop benchmark inputs
# there, rsync the whole repo checkout to the research Mac, and the videos
# travel alongside the code without needing a second path on either end.
# Everything inside is ignored except the README; the global video-suffix
# rules above already cover common extensions, and this belt catches any
# other cruft (partial rsync transfers, .DS_Store, etc.) that might land
# in the directory.
#
# Policy reminder: benchmark test videos only. Subject / clinical data
# must still go through $XDG_DATA_HOME — see benchmarks/README.md.
/benchmarks/videos/*
!/benchmarks/videos/README.md
# --- Secrets ---------------------------------------------------------------
.env
.env.*

View File

@ -86,7 +86,13 @@ be split into per-release sections once tagging begins.
width, height), `VideoPredictions` (metadata envelope + frames
mapping + optional `segmentations` field), `JobResults`,
`JobStatus` enum, `JobStatusEntry` (with a structured `error`
field), and `StatusFile`. Performance schema: frozen
field plus optional live-progress fields — `current_video`,
`frames_processed`, `frames_total`, `videos_completed`,
`videos_total`, `percent_complete`, `last_update` — populated by
the interfacer during inference and consumed by
`neuropose.monitor`), and `StatusFile`. Legacy status files
written before the progress fields existed still load cleanly
because every new field is optional with a `None` default. Performance schema: frozen
`PerformanceMetrics` carrying per-call timings
(`model_load_seconds`, `total_seconds`, `per_frame_latencies_ms`),
`peak_rss_mb`, `active_device`, `tensorflow_metal_active`, and
@ -154,6 +160,48 @@ be split into per-release sections once tagging begins.
download was truncated). Post-load interface check for
`detect_poses`, `per_skeleton_joint_names`, and
`per_skeleton_joint_edges`.
- **`neuropose.monitor`** — localhost HTTP status dashboard. A small
`http.server`-based HTTP server (pure stdlib, zero new runtime
dependencies) that serves a plain HTML page at `GET /` with an
auto-refresh meta tag, one row per tracked job, a
`<progress>` bar, and a stale-entry warning badge for
`processing` jobs whose `last_update` has not ticked in 60 s.
`GET /status.json` returns the raw validated `StatusFile` as JSON
for `curl`/scripted pipelines; `?job=<name>` filters to a single
entry. `GET /health` is a simple liveness probe. Binds to
`127.0.0.1:8765` by default — loopback-only, with an explicit
`--host` override required to expose externally. Every request
re-reads `status.json`, so the monitor has no in-memory cache, no
sync protocol with the daemon, and stays useful even if the
daemon is down (last-known state surfaced with the stale badge).
- **Progress checkpointing in the interfacer.** `Interfacer` now
updates the currently-running job's `JobStatusEntry` every
`settings.status_checkpoint_every_frames` frames (default 30, a
new `Settings` field) during inference via the estimator's
`progress` callback. Each checkpoint rewrites `status.json`
atomically through the existing `save_status` helper; writes are
best-effort and I/O failures are logged without interrupting
inference. `_run_job_inner` seeds a "videos_total=N" checkpoint
before calling the estimator so the monitor shows the job's
scope from the first poll. Checkpoint cadence is knob-exposed for
operators who want to tune the smoothness-vs-write-rate trade-off.
- **`neuropose.ingest`** — zip-archive intake utility. `ingest_zip()`
extracts a zip of videos into one job directory per video under
`$data_dir/in/`, with validation-before-write (path-traversal and
absolute-path members rejected, oversize archives rejected at the
20 GB-uncompressed cap), zip-internal and external collision
detection reported in one shot, non-video members silently
skipped (`.DS_Store`, `README.md`, etc.), and per-job atomic
placement via a staging directory + `os.rename`. Nested paths are
flattened into job names by joining components with underscores
and sanitising unsafe characters — `patient_001/trial_01.mp4`
becomes job `patient_001_trial_01`, preserving disambiguation
against a sibling `patient_002/trial_01.mp4`. Typed exception
hierarchy: `IngestError`, `ArchiveInvalidError`,
`ArchiveEmptyError`, `ArchiveTooLargeError`, `JobCollisionError`
(with a `.collisions` list of offending names). The running
daemon needs no changes — ingested job dirs are picked up on the
next poll.
- **`neuropose.benchmark`** — multi-pass inference benchmarking for
a single video. `run_benchmark()` runs `process_video` N times
(default 5), always discards the first pass as warmup (graph
@ -200,10 +248,19 @@ be split into per-release sections once tagging begins.
`slow`) loads the real model and asserts the constant still
matches, so any upstream skeleton drift fails CI.
- **`neuropose.cli`** — Typer-based command-line interface with
five subcommands: `watch` (run the daemon), `process <video>`
(run the estimator on a single video), `segment <results>`
(post-hoc repetition segmentation — loads a JobResults or a
single VideoPredictions, runs
seven subcommands: `watch` (run the daemon), `process <video>`
(run the estimator on a single video), `ingest <archive>` (unzip
a video archive into per-video job directories under
`$data_dir/in/` with validation-before-write and atomic
placement; `--force` overwrites collisions, otherwise the whole
operation refuses if any target name already exists),
`serve` (start the localhost HTTP monitor at `127.0.0.1:8765`
by default — `--host` and `--port` are the two overrides;
KeyboardInterrupt exits with the standard shell-interruption
code and an `OSError` at bind time is translated to a clean
usage error with the bind target in the message),
`segment <results>` (post-hoc repetition segmentation — loads a
JobResults or a single VideoPredictions, runs
`neuropose.analyzer.segment.segment_predictions` with the chosen
extractor and thresholds, and atomically writes the file back
with the new segmentation attached under `--name`),

115
benchmarks/README.md Normal file
View File

@ -0,0 +1,115 @@
# `benchmarks/`
Workflow convenience directory for performance benchmarking. Drop
videos into `benchmarks/videos/`, then rsync the whole repo to a
research machine — the videos travel alongside the code without
needing a second path on either end.
This directory is **not** a persistent data store. It is a local
scratch area for ephemeral benchmark inputs. Everything under
`benchmarks/videos/` is gitignored (see the `.gitignore` rules and
the per-directory README) so committing videos by accident is
mechanically prevented.
## Policy
**Only put benchmark test videos here. Never subject or clinical
recordings.** Subject data still goes through
`$XDG_DATA_HOME/neuropose/` via the normal interfacer workflow, and
that rule is load-bearing for the project's data-handling posture.
The distinction in practice:
| Type of video | Goes here | Goes to `$XDG_DATA_HOME` |
| ------------------------------------------------ | --------- | ------------------------ |
| Synthetic test videos (gradient frames, etc.) | Yes | No |
| Public-domain reference footage | Yes | No |
| Recordings you personally filmed for benchmarking | Yes | No |
| Anything a clinician recorded | **No** | Yes |
| Anything with an identifiable subject | **No** | Yes |
| Anything IRB-gated | **No** | Yes |
When in doubt, route through `$XDG_DATA_HOME`. That path has
cross-machine isolation by design; this directory does not.
## Usage
Assuming you have a short test video to work with:
```console
$ cp ~/Downloads/short_clip.mp4 benchmarks/videos/
$ uv run neuropose benchmark benchmarks/videos/short_clip.mp4 \
--repeats 5 --warmup-frames 3 \
--output benchmarks/videos/short_clip_run.json
```
The `*.json` benchmark output is also gitignored — `.json` is not
a tracked extension inside `benchmarks/videos/` because only
`README.md` is whitelisted in that directory.
## Rsyncing to the research Mac
The directory layout is designed so one `rsync` path covers both
code and videos:
```console
$ rsync -av --delete \
--exclude='.venv/' \
--exclude='site/' \
--exclude='.git/' \
~/Repos/research/brown/shu/neuropose/ \
mac.local:~/Repos/research/brown/shu/neuropose/
```
After the sync, the videos in `benchmarks/videos/` on the Mac are
identical to the ones on Linux, so a benchmark run on the Mac can
reference the same filename the Linux report does — makes cross-
machine comparisons trivial.
Tips:
- Add `--exclude='benchmarks/videos/*.json'` if you want to keep
per-machine benchmark results isolated.
- `--delete` makes the target exactly mirror the source. Without
it, old files on the target persist — safer but surprising.
- For one-off pushes, `scp benchmarks/videos/clip.mp4
mac.local:~/Repos/research/brown/shu/neuropose/benchmarks/videos/`
works without touching the rest of the repo.
## Bulk intake via `neuropose ingest`
When you have a whole batch of recordings instead of a single
benchmark clip, drop the zip archive anywhere you like (the
`benchmarks/videos/` directory is fine for transient test zips) and
let `neuropose ingest` unpack them into per-video job directories
for the running daemon:
```console
$ uv run neuropose ingest benchmarks/videos/session_2026-04-15.zip
ingested 12 job(s) from benchmarks/videos/session_2026-04-15.zip (1842.3 MB, 2 non-video member(s) skipped)
patient_001_trial_01/trial_01.mp4
patient_001_trial_02/trial_02.mp4
...
```
Each video becomes its own `$data_dir/in/<job_name>/` directory,
and the daemon picks them up on its next poll with no further
operator action. Pass `--force` to overwrite existing job
directories with the same derived names. See the `neuropose
ingest --help` output for the full flag surface and
[`neuropose.ingest`](../docs/api/ingest.md) for the library API.
**Reminder:** the `benchmarks/videos/` directory is for benchmark
test data only, including the zip archives you pass to `ingest`.
Actual clinical recordings should not transit through this
directory — route those through `$XDG_DATA_HOME` instead.
## Directory layout
```
benchmarks/
├── README.md # this file (tracked)
└── videos/
├── README.md # placeholder to keep the directory tracked
└── <your-clip>.mp4 # ignored
```

View File

@ -0,0 +1,12 @@
# `benchmarks/videos/`
Scratch directory for benchmark input videos. Everything in here
except this README is gitignored.
**Drop benchmark test videos only. Never subject or clinical
recordings.** See `../README.md` for the full policy and the
rsync-to-Mac workflow.
This file exists so the directory itself is trackable — git does
not track empty directories, and a placeholder README is more
discoverable than a `.gitkeep`.

3
docs/api/ingest.md Normal file
View File

@ -0,0 +1,3 @@
# `neuropose.ingest`
::: neuropose.ingest

3
docs/api/monitor.md Normal file
View File

@ -0,0 +1,3 @@
# `neuropose.monitor`
::: neuropose.monitor

View File

@ -68,6 +68,77 @@ output. This is the "is `tensorflow-metal` producing correct
numerics?" check that `RESEARCH.md`'s TensorFlow-version-compatibility
section leaves open for v0.1.
### monitor
**Role:** localhost HTTP dashboard. Reads `$data_dir/out/status.json`
on every request and serves a small HTML page (with an auto-refresh
`<meta>` tag, per-job progress bars, and stale-entry warnings) plus
the raw `StatusFile` as JSON at `/status.json`. Runs as a separate
process from the daemon — operators start it with `neuropose serve`,
it is safe to run alongside `neuropose watch`, and it stays useful
even if the daemon is down (last-known state is shown with the stale
badge flagging any lingering `processing` entries).
Design choices:
- **Pure stdlib.** The server is built on `http.server` with a small
request-handler subclass. No FastAPI, no Flask — this is a localhost
tool and the cost of a framework is not justified.
- **Loopback by default.** Binds to `127.0.0.1:8765` with an explicit
`--host` flag to override. Collaborators on the same machine reach
it directly; collaborators elsewhere should go through an SSH
tunnel or explicitly configured reverse proxy. Binding to
`0.0.0.0` is a real network-exposure decision the operator should
make with eyes open.
- **No cache.** Every request re-reads `status.json`, which is tiny
and already written atomically by the daemon, so no sync protocol
is needed between the two processes.
- **Two surfaces, same data.** `GET /` renders HTML for browsers;
`GET /status.json` returns the raw `StatusFile` for `curl` /
scripted pipelines. `?job=<name>` filters to a single entry for
programmatic consumers that only care about one job.
- **Live progress data comes from the interfacer**, not from the
monitor. The interfacer checkpoints the currently-running job's
`JobStatusEntry` every
`settings.status_checkpoint_every_frames` frames (default 30) via
the estimator's `progress` callback, updating `frames_processed`,
`percent_complete`, and `last_update` on the in-memory status file
and calling `save_status`. The monitor then reads those fields and
renders the progress bar — no separate live channel, no in-memory
cache on the monitor side.
### ingest
**Role:** bulk-intake utility. Accepts a zip archive of videos and
produces one job directory per video under `$data_dir/in/`, ready for
the `interfacer` daemon to pick up on its next poll.
The ingester is a **pure library call** (`ingest_zip`) plus a thin
CLI wrapper (`neuropose ingest`). It does not run inference itself —
it only stages files so the existing watch-directory pipeline does
the work. Key guarantees:
- **Validate-before-write.** Path-traversal members, zip bombs, and
empty archives are rejected before any file lands on disk, so a
failed ingest leaves the operator with a clean state.
- **Transactional placement.** Each video is extracted to a staging
directory under `$data_dir/.ingest_<uuid>/`, and only then
atomically renamed into `$data_dir/in/<job_name>/`. The daemon
never sees a half-populated job directory.
- **Collision detection is up-front and exhaustive.** Zip-internal
collisions (two videos that flatten to the same job name) and
external collisions (a job directory already exists) are reported
as a single error listing every offending name. `--force`
deletes-and-replaces; without it, nothing is written.
- **Flattening preserves disambiguation.** The in-archive path is
joined with underscores into the job name — `patient_001/trial_01.mp4`
→ job `patient_001_trial_01` — so nested organisation survives the
flattening without collapsing into silent collisions.
The set of accepted extensions comes from
`neuropose.interfacer.VIDEO_EXTENSIONS`, so any format the daemon
can already process is a valid ingest target.
### interfacer
**Role:** job-lifecycle daemon. Watches `input_dir` for new job

View File

@ -93,6 +93,8 @@ nav:
- neuropose.config: api/config.md
- neuropose.estimator: api/estimator.md
- neuropose.interfacer: api/interfacer.md
- neuropose.ingest: api/ingest.md
- neuropose.monitor: api/monitor.md
- neuropose.io: api/io.md
- neuropose.benchmark: api/benchmark.md
- neuropose.analyzer.segment: api/segment.md

View File

@ -1,11 +1,17 @@
"""NeuroPose command-line interface.
Five subcommands:
Seven subcommands:
- ``neuropose watch`` run the :class:`~neuropose.interfacer.Interfacer`
daemon against the configured input directory.
- ``neuropose process <video>`` run the estimator on a single video and
write the predictions JSON to disk.
- ``neuropose ingest <archive>`` extract a zip archive of videos into
per-video job directories under ``$data_dir/in/``, queued for the
running daemon to process.
- ``neuropose serve`` start the :mod:`~neuropose.monitor` localhost
HTTP dashboard so collaborators can watch a run's progress in a
browser or via ``curl``.
- ``neuropose segment <results>`` post-hoc repetition segmentation of
an existing predictions file. Attaches a named
:class:`~neuropose.io.Segmentation` to every video it contains and
@ -256,6 +262,170 @@ def process(
typer.echo(f"wrote {out_path} ({result.frame_count} frames)")
# ---------------------------------------------------------------------------
# ingest
# ---------------------------------------------------------------------------
@app.command()
def ingest(
ctx: typer.Context,
archive: Annotated[
Path,
typer.Argument(
exists=True,
file_okay=True,
dir_okay=False,
readable=True,
help=(
"Path to a zip archive of videos. Each video inside becomes "
"its own job directory under $data_dir/in/ and is picked up "
"by the running daemon on its next poll."
),
),
],
force: Annotated[
bool,
typer.Option(
"--force",
help=(
"Overwrite any existing job directories that collide with "
"the archive's contents. Without --force, the first "
"collision aborts the whole operation without writing to "
"disk."
),
),
] = False,
) -> None:
"""Ingest a zip archive of videos into the daemon's input directory.
Each video member of the archive produces one new job directory in
``$data_dir/in/<job_name>/``. Job names are derived from the
in-archive path (slashes become underscores, extension dropped,
unsafe characters sanitized) so nested structures like
``patient_001/trial_01.mp4`` and ``patient_002/trial_01.mp4``
cannot collide into the same job.
The ingest is transactional: either every video in the archive is
placed, or none are. Zip-internal collisions, external collisions
with existing job directories, path-traversal members, and total-
size overruns are all caught before any disk writes happen.
Non-video members (``README.md``, ``.DS_Store``, etc.) are silently
skipped. The set of accepted extensions matches the interfacer's
:data:`~neuropose.interfacer.VIDEO_EXTENSIONS`.
"""
# Deferred import keeps the ingest module's dependencies (zipfile,
# shutil) out of the CLI's top-level import surface.
from neuropose.ingest import (
ArchiveEmptyError,
ArchiveInvalidError,
ArchiveTooLargeError,
IngestError,
JobCollisionError,
ingest_zip,
)
settings: Settings = ctx.obj
try:
result = ingest_zip(
archive,
input_dir=settings.input_dir,
force=force,
)
except JobCollisionError as exc:
typer.echo(f"error: {exc}", err=True)
for name in exc.collisions:
typer.echo(f" - {name}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
except ArchiveEmptyError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
except ArchiveTooLargeError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
except ArchiveInvalidError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
except IngestError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
typer.echo(
f"ingested {result.job_count} job(s) from {archive} "
f"({result.total_uncompressed_bytes / (1024 * 1024):.1f} MB, "
f"{len(result.skipped_non_videos)} non-video member(s) skipped)"
)
for job in result.ingested:
typer.echo(f" {job.job_name}/{job.video_filename}")
# ---------------------------------------------------------------------------
# serve
# ---------------------------------------------------------------------------
@app.command()
def serve(
ctx: typer.Context,
host: Annotated[
str,
typer.Option(
"--host",
help=(
"Interface to bind. Defaults to 127.0.0.1 so the "
"monitor is loopback-only; pass 0.0.0.0 or an explicit "
"external IP only when you have thought through the "
"network exposure."
),
),
] = "127.0.0.1",
port: Annotated[
int,
typer.Option(
"--port",
min=1,
max=65535,
help="TCP port to listen on.",
),
] = 8765,
) -> None:
"""Serve the NeuroPose localhost status dashboard.
Reads the daemon's ``status.json`` (at ``Settings.status_file``)
on every HTTP request and renders a small dashboard at ``/`` plus
JSON at ``/status.json`` and a liveness probe at ``/health``.
Safe to run alongside ``neuropose watch``; safe to run without
the daemon (last-known state is served with a stale-entry warning
for any lingering ``processing`` jobs).
Blocks in ``serve_forever`` until interrupted by SIGINT or
SIGTERM, at which point the server shuts down cleanly.
"""
settings: Settings = ctx.obj
# Deferred import: the monitor module and its stdlib HTTP server
# are only needed for this subcommand, and keeping them off the
# watch/process hot path avoids paying their import cost on the
# inference daemon's startup.
from neuropose.monitor import serve_forever
status_path = settings.status_file
typer.echo(f"serving neuropose monitor on http://{host}:{port}/")
typer.echo(f" reading {status_path}")
typer.echo(f" JSON: http://{host}:{port}/status.json")
typer.echo(" press Ctrl-C to stop")
try:
serve_forever(status_path, host=host, port=port)
except KeyboardInterrupt as exc:
raise typer.Exit(code=EXIT_INTERRUPTED) from exc
except OSError as exc:
# Typical failure: address already in use, or permission
# denied binding to a privileged port. Translate to a clean
# usage error rather than a raw traceback.
typer.echo(f"error: could not bind {host}:{port}: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
# ---------------------------------------------------------------------------
# segment
# ---------------------------------------------------------------------------

View File

@ -78,6 +78,19 @@ class Settings(BaseSettings):
poll_interval_seconds: int = Field(default=10, ge=1)
device: str = Field(default="/CPU:0")
default_fov_degrees: float = Field(default=55.0, gt=0.0, lt=180.0)
status_checkpoint_every_frames: int = Field(
default=30,
ge=1,
description=(
"Write the in-progress job's status entry back to status.json "
"every N frames during inference. Powers the live progress "
"bar in neuropose.monitor. Lower values produce smoother "
"progress at the cost of more atomic-rename writes; higher "
"values are cheaper but leave collaborators staring at a "
"stale percentage for longer. The default (30) is a smooth "
"update every ~3 s at 10 fps inference."
),
)
@field_validator("device")
@classmethod

421
src/neuropose/ingest.py Normal file
View File

@ -0,0 +1,421 @@
"""Zip-archive ingestion for the NeuroPose job queue.
Operators frequently arrive with a zip archive of videos pulled off a
camera SD card, exported from a study laptop, etc. and want every
video inside processed as an independent job without manually
unpacking the archive, renaming collisions, and copying files into
``$data_dir/in/<job_name>/`` one at a time. This module automates
exactly that workflow.
Responsibilities
----------------
- **Validate the archive before touching disk.** Path-traversal members
(absolute paths, ``..`` components) are refused; the total
uncompressed size is capped; only files whose suffixes are in
:data:`neuropose.interfacer.VIDEO_EXTENSIONS` are considered as
candidates. Non-video members are logged and skipped, not treated
as errors zips commonly carry ``.DS_Store``, ``README.md``, etc.
- **Derive one job name per video.** The in-archive path is
slash-replaced, extension-stripped, and sanitized so that nested
structure like ``patient_001/trial_01.mp4`` becomes
``patient_001_trial_01`` and never collides with a sibling
``trial_01.mp4`` in another directory.
- **Atomic per-job placement.** Each video is extracted to a staging
directory under ``$data_dir/.ingest_<uuid>/`` first. Only after the
whole archive has been successfully extracted does the ingester
rename each staged job directory into ``$data_dir/in/<job_name>/``
with :func:`os.rename`, which is atomic on the same filesystem.
This way the daemon never sees a half-extracted job directory and
the "empty directory mid-copy" skip heuristic in the interfacer is
never exercised during ingest.
- **Collision detection, up-front.** Zip-internal collisions (two
videos that would flatten to the same job name) and external
collisions (a job directory of the same name already exists under
``in/``) are detected *before* any disk write. The default is to
refuse the operation and report every colliding name at once;
``force=True`` deletes the colliding ``in/<job_name>/`` directories
and proceeds.
This module is pure library: the CLI wrapper lives in
:mod:`neuropose.cli`, and the function here does not touch the
logger's configuration. Errors surface as typed exceptions so the
CLI can translate them into stable exit codes.
"""
from __future__ import annotations
import logging
import re
import shutil
import uuid
import zipfile
from dataclasses import dataclass, field
from pathlib import Path, PurePosixPath
from neuropose.interfacer import VIDEO_EXTENSIONS
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tunables
# ---------------------------------------------------------------------------
MAX_UNCOMPRESSED_BYTES: int = 20 * 1024 * 1024 * 1024 # 20 GB
"""Maximum total uncompressed size of a single zip archive.
Not a hard clinical limit just a zip-bomb guard. A real research
archive with a few dozen multi-GB recordings stays well below this
number; anything above it is almost certainly a mistake or an attack.
The cap is enforced by summing :attr:`zipfile.ZipInfo.file_size` over
every candidate member *before* any extraction starts, so a bomb that
expands to 100 GB is rejected without writing a single byte to disk.
"""
_JOB_NAME_SAFE_PATTERN = re.compile(r"[^A-Za-z0-9._-]+")
"""Characters allowed in a derived job name. Everything else becomes ``_``."""
# ---------------------------------------------------------------------------
# Exceptions
# ---------------------------------------------------------------------------
class IngestError(Exception):
"""Base class for errors raised by :func:`ingest_zip`."""
class ArchiveInvalidError(IngestError):
"""The zip archive is malformed, unreadable, or contains unsafe members."""
class ArchiveEmptyError(IngestError):
"""The archive contains no files with supported video extensions."""
class ArchiveTooLargeError(IngestError):
"""The archive's total uncompressed size exceeds :data:`MAX_UNCOMPRESSED_BYTES`."""
class JobCollisionError(IngestError):
"""One or more target job directories already exist, or two videos collapse to the same name.
The ``collisions`` attribute lists the colliding job names so the
caller can report them in one shot rather than piecemeal.
"""
def __init__(self, message: str, collisions: list[str]) -> None:
super().__init__(message)
self.collisions = collisions
# ---------------------------------------------------------------------------
# Result container
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class IngestedJob:
"""A single job directory created by :func:`ingest_zip`."""
job_name: str
job_dir: Path
video_filename: str
uncompressed_bytes: int
@dataclass(frozen=True)
class IngestResult:
"""Aggregate result of :func:`ingest_zip`."""
archive: Path
ingested: list[IngestedJob] = field(default_factory=list)
skipped_non_videos: list[str] = field(default_factory=list)
@property
def job_count(self) -> int:
"""Return the number of job directories created."""
return len(self.ingested)
@property
def total_uncompressed_bytes(self) -> int:
"""Return the total uncompressed size of all ingested videos in bytes."""
return sum(job.uncompressed_bytes for job in self.ingested)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def ingest_zip(
archive_path: Path,
input_dir: Path,
*,
force: bool = False,
) -> IngestResult:
"""Extract videos from a zip archive into per-video job directories.
Parameters
----------
archive_path
Path to the zip archive to ingest.
input_dir
The interfacer's input directory — typically
``Settings.input_dir`` (``$data_dir/in``). Each video in the
archive produces a new subdirectory here, and the running
daemon picks them up on its next poll.
force
If ``True``, overwrite any pre-existing ``input_dir/<job_name>/``
directories that collide with the archive's contents by
removing them first. If ``False`` (default), raise
:class:`JobCollisionError` listing every colliding name and
perform no disk writes.
Returns
-------
IngestResult
Record of which jobs were created and which archive members
were skipped as non-video.
Raises
------
FileNotFoundError
If ``archive_path`` does not exist.
ArchiveInvalidError
If the archive is not a valid zip, contains a member with an
absolute path or traversal components, or cannot be read.
ArchiveEmptyError
If the archive contains no files with supported video
extensions.
ArchiveTooLargeError
If the total uncompressed size of candidate video members
exceeds :data:`MAX_UNCOMPRESSED_BYTES`.
JobCollisionError
If two members of the archive flatten to the same job name,
or if any target ``input_dir/<job_name>/`` exists and
``force`` is ``False``.
"""
if not archive_path.exists():
raise FileNotFoundError(f"archive not found: {archive_path}")
input_dir.mkdir(parents=True, exist_ok=True)
# Phase 1: inspect the archive, decide every target job name, and
# catch all collisions before touching disk.
plan = _plan_ingest(archive_path, input_dir=input_dir, force=force)
# Phase 2: extract every video into a staging directory under the
# same filesystem as input_dir, so the final os.rename is atomic.
# Using a uuid in the staging directory name avoids collisions
# with concurrent ingests and with the daemon's lock file.
staging_root = input_dir.parent / f".ingest_{uuid.uuid4().hex}"
staging_root.mkdir(parents=True, exist_ok=False)
ingested: list[IngestedJob] = []
try:
with zipfile.ZipFile(archive_path, "r") as archive:
for entry in plan.entries:
staged_job_dir = staging_root / entry.job_name
staged_job_dir.mkdir(parents=True, exist_ok=False)
staged_video = staged_job_dir / entry.output_filename
with (
archive.open(entry.member) as src,
staged_video.open("wb") as dst,
):
shutil.copyfileobj(src, dst)
# Phase 3: move the staged job directories into place. The
# delete-then-rename dance under --force is only non-atomic
# per-job — each individual job still flips from "does not
# exist" to "fully populated" in one rename(2) call.
for entry in plan.entries:
staged_job_dir = staging_root / entry.job_name
target_job_dir = input_dir / entry.job_name
if force and target_job_dir.exists():
shutil.rmtree(target_job_dir)
staged_job_dir.rename(target_job_dir)
ingested.append(
IngestedJob(
job_name=entry.job_name,
job_dir=target_job_dir,
video_filename=entry.output_filename,
uncompressed_bytes=entry.member.file_size,
)
)
except Exception:
# Clean up whatever made it into input_dir so the caller is
# not left with a half-ingested state on, e.g., a zip that
# turns out to be truncated partway through extraction. We do
# NOT unwind under success; this branch is the failure path.
for job in ingested:
shutil.rmtree(job.job_dir, ignore_errors=True)
raise
finally:
shutil.rmtree(staging_root, ignore_errors=True)
logger.info(
"Ingested %d job(s) from %s (%d non-video members skipped)",
len(ingested),
archive_path,
len(plan.skipped_non_videos),
)
return IngestResult(
archive=archive_path,
ingested=ingested,
skipped_non_videos=plan.skipped_non_videos,
)
# ---------------------------------------------------------------------------
# Private planning helpers
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class _PlannedEntry:
"""One zip member bound for a specific job directory."""
member: zipfile.ZipInfo
job_name: str
output_filename: str
@dataclass(frozen=True)
class _IngestPlan:
"""Result of :func:`_plan_ingest` — a concrete set of extractions."""
entries: list[_PlannedEntry]
skipped_non_videos: list[str]
def _plan_ingest(
archive_path: Path,
*,
input_dir: Path,
force: bool,
) -> _IngestPlan:
"""Scan the archive and produce a validated extraction plan.
No disk is touched except for opening the zip read-only. Every
failure mode that we want to surface before extraction (bad archive,
traversal, empty, too big, collision) is decided in this function
so :func:`ingest_zip` is either a no-op or a complete success.
"""
try:
archive = zipfile.ZipFile(archive_path, "r")
except zipfile.BadZipFile as exc:
raise ArchiveInvalidError(f"not a valid zip archive: {archive_path}") from exc
with archive:
try:
members = archive.infolist()
except Exception as exc:
raise ArchiveInvalidError(
f"could not read archive member list: {archive_path}"
) from exc
entries: list[_PlannedEntry] = []
skipped: list[str] = []
job_name_to_member: dict[str, str] = {}
total_bytes = 0
for member in members:
if member.is_dir():
continue
member_name = member.filename
_check_member_path_safe(member_name)
# Filter by suffix. Non-video members are not errors — the
# zip can carry README.md, .DS_Store, etc. alongside the
# real payload.
if PurePosixPath(member_name).suffix.lower() not in VIDEO_EXTENSIONS:
skipped.append(member_name)
continue
total_bytes += member.file_size
if total_bytes > MAX_UNCOMPRESSED_BYTES:
raise ArchiveTooLargeError(
f"uncompressed size exceeds {MAX_UNCOMPRESSED_BYTES} bytes "
f"(cap) after including {member_name}; refusing to extract"
)
job_name = _derive_job_name(member_name)
output_filename = PurePosixPath(member_name).name
if job_name in job_name_to_member:
raise JobCollisionError(
"zip contains two videos that flatten to the same job name "
f"{job_name!r}: {job_name_to_member[job_name]!r} and {member_name!r}",
collisions=[job_name],
)
job_name_to_member[job_name] = member_name
entries.append(
_PlannedEntry(
member=member,
job_name=job_name,
output_filename=output_filename,
)
)
if not entries:
raise ArchiveEmptyError(
f"archive {archive_path} contains no video files with "
f"supported extensions ({sorted(VIDEO_EXTENSIONS)})"
)
# External collisions: a pre-existing job directory with the same
# name. Report every collision at once so the operator sees the
# full picture in one error, not one at a time.
if not force:
existing = [entry.job_name for entry in entries if (input_dir / entry.job_name).exists()]
if existing:
raise JobCollisionError(
f"{len(existing)} job director{'y' if len(existing) == 1 else 'ies'} "
f"already exist in {input_dir}; pass --force to overwrite",
collisions=existing,
)
return _IngestPlan(entries=entries, skipped_non_videos=skipped)
def _check_member_path_safe(name: str) -> None:
"""Refuse zip members with absolute paths or ``..`` traversal.
Python's :mod:`zipfile` does not raise on these by default on
3.11, so we validate each member name before it is extracted.
"""
if not name:
raise ArchiveInvalidError("zip archive contains an empty member name")
if name.startswith("/") or (len(name) >= 2 and name[1] == ":"):
raise ArchiveInvalidError(f"zip archive contains an absolute-path member: {name!r}")
# PurePosixPath handles both forward and back slashes via the
# component-level check on ``..`` below.
parts = PurePosixPath(name).parts
if any(part == ".." for part in parts):
raise ArchiveInvalidError(
f"zip archive contains a traversal member ({name!r}); refusing to extract"
)
def _derive_job_name(member_name: str) -> str:
"""Convert an in-archive path into a safe job-directory name.
The full in-archive path (minus the final extension) is joined
with underscores so ``patient_001/trial_01.mp4`` becomes
``patient_001_trial_01`` and stays unambiguous even if a sibling
directory carries a file named ``trial_01.mp4``. Any character
outside ``[A-Za-z0-9._-]`` is replaced with an underscore, then
runs of underscores are collapsed, leading/trailing underscores
are stripped, and the result is non-empty by construction (a
pathological all-symbol name falls back to ``video``).
"""
path = PurePosixPath(member_name)
stem_parts = [*path.parts[:-1], path.stem]
raw = "_".join(stem_parts)
sanitised = _JOB_NAME_SAFE_PATTERN.sub("_", raw)
# Collapse runs of underscores and strip leading/trailing ones so
# the result doesn't look like `_foo__bar_`.
collapsed = re.sub(r"_+", "_", sanitised).strip("_")
return collapsed or "video"

View File

@ -259,6 +259,15 @@ class Interfacer:
Raises on any failure the caller in :meth:`process_job` handles
the transition to the failed state and the quarantine move.
During inference the interfacer updates the job's
:class:`JobStatusEntry` roughly every
:attr:`~neuropose.config.Settings.status_checkpoint_every_frames`
frames, so :mod:`neuropose.monitor` can render a live progress
bar for collaborators. Each checkpoint is a full atomic rewrite
of ``status.json`` via :func:`save_status`, which is
acceptable because scheduled pose inference is many orders of
magnitude more expensive than the write itself.
"""
job_in_path = self._settings.input_dir / job_name
job_out_path = self._settings.output_dir / job_name
@ -271,15 +280,58 @@ class Interfacer:
f"(accepted extensions: {sorted(VIDEO_EXTENSIONS)})"
)
videos_total = len(videos)
# Seed the initial progress checkpoint so the monitor shows
# "videos_total=N, videos_completed=0" from the first poll after
# the job starts, rather than waiting until a callback fires.
self._checkpoint_progress(
job_name,
started_at=started_at,
current_video=videos[0].name,
frames_processed=0,
frames_total=None,
videos_completed=0,
videos_total=videos_total,
)
per_video_predictions = {}
for video_path in videos:
checkpoint_every = self._settings.status_checkpoint_every_frames
for video_index, video_path in enumerate(videos):
if self._stop:
raise JobProcessingError(
f"stop requested mid-job after processing "
f"{len(per_video_predictions)}/{len(videos)} videos"
f"{len(per_video_predictions)}/{videos_total} videos"
)
logger.info("[%s] Processing video %s", job_name, video_path.name)
result = self._estimator.process_video(video_path)
def _on_frame(
processed: int,
total_hint: int,
*,
# Bind the loop-local values so the closure captures
# them correctly for each iteration — without this the
# late-binding gotcha would make every callback report
# the last video's name once the loop advances.
_job_name: str = job_name,
_started_at: datetime = started_at,
_current_video: str = video_path.name,
_video_index: int = video_index,
_videos_total: int = videos_total,
_checkpoint_every: int = checkpoint_every,
) -> None:
if processed % _checkpoint_every != 0:
return
self._checkpoint_progress(
_job_name,
started_at=_started_at,
current_video=_current_video,
frames_processed=processed,
frames_total=total_hint if total_hint > 0 else None,
videos_completed=_video_index,
videos_total=_videos_total,
)
result = self._estimator.process_video(video_path, progress=_on_frame)
per_video_predictions[video_path.name] = result.predictions
logger.info(
"[%s] Processed %s (%d frames)",
@ -287,6 +339,19 @@ class Interfacer:
video_path.name,
result.frame_count,
)
# Post-video checkpoint: snap videos_completed to the end of
# this video even if the last frame didn't fall on the
# checkpoint cadence, so the monitor's "N / M videos done"
# line is always exact after a video finishes.
self._checkpoint_progress(
job_name,
started_at=started_at,
current_video=video_path.name,
frames_processed=result.frame_count,
frames_total=result.frame_count,
videos_completed=video_index + 1,
videos_total=videos_total,
)
job_results = JobResults(root=per_video_predictions)
results_path = job_out_path / "results.json"
@ -298,8 +363,73 @@ class Interfacer:
started_at=started_at,
completed_at=datetime.now(UTC),
results_path=results_path,
videos_completed=videos_total,
videos_total=videos_total,
percent_complete=100.0,
last_update=datetime.now(UTC),
)
def _checkpoint_progress(
self,
job_name: str,
*,
started_at: datetime,
current_video: str,
frames_processed: int,
frames_total: int | None,
videos_completed: int,
videos_total: int,
) -> None:
"""Rewrite ``status.json`` with the current per-job progress.
Computes ``percent_complete`` across the whole job: videos that
have fully finished contribute ``1.0`` each, and the current
video contributes ``frames_processed / frames_total`` if the
frame-count hint is known, else a partial-credit estimate of
``0.5``. The overall fraction is then averaged across
``videos_total`` and scaled to 0-100.
Never raises on an I/O error progress checkpoints are
best-effort. If the write fails we log and move on so the
inference loop keeps making forward progress.
"""
if frames_total and frames_total > 0:
current_fraction = frames_processed / frames_total
elif frames_processed > 0:
current_fraction = 0.5
else:
current_fraction = 0.0
overall_fraction = (videos_completed + current_fraction) / max(videos_total, 1)
percent = max(0.0, min(100.0, overall_fraction * 100.0))
try:
status = load_status(self._settings.status_file)
existing = status.root.get(job_name)
if existing is None:
# The entry should have been seeded by process_job before
# _run_job_inner was called. If it isn't there, skip the
# checkpoint — something else has already deleted the
# entry and we do not want to recreate a ghost.
return
status.root[job_name] = existing.model_copy(
update={
"current_video": current_video,
"frames_processed": frames_processed,
"frames_total": frames_total,
"videos_completed": videos_completed,
"videos_total": videos_total,
"percent_complete": percent,
"last_update": datetime.now(UTC),
}
)
save_status(self._settings.status_file, status)
except Exception:
logger.warning(
"[%s] Failed to checkpoint progress; continuing inference",
job_name,
exc_info=True,
)
def _discover_new_jobs(self, status: StatusFile) -> list[str]:
"""Return names of job subdirectories not yet tracked in ``status``.

View File

@ -515,7 +515,24 @@ class JobResults(RootModel[dict[str, VideoPredictions]]):
class JobStatusEntry(BaseModel):
"""Status entry for a single job in the persistent status file."""
"""Status entry for a single job in the persistent status file.
The progress fields (``current_video``, ``frames_processed``,
``frames_total``, ``videos_completed``, ``videos_total``,
``percent_complete``, ``last_update``) are populated by the
:class:`~neuropose.interfacer.Interfacer` as inference proceeds,
on a cadence driven by
:attr:`~neuropose.config.Settings.status_checkpoint_every_frames`.
They are optional so that legacy status files written before the
monitor shipped still validate on load, and so that any entry in
a terminal state (``completed`` / ``failed``) can leave them as
``None`` without confusing the monitor.
``percent_complete`` is the overall job progress in
``[0.0, 100.0]``, computed across all videos in a multi-video job
(one video done + one partially done fractional numerator). The
monitor renders it directly into the progress bar.
"""
model_config = ConfigDict(extra="forbid")
@ -530,6 +547,58 @@ class JobStatusEntry(BaseModel):
"Populated by the interfacer on failure paths."
),
)
current_video: str | None = Field(
default=None,
description=(
"Filename (basename, no directory) of the video currently "
"being processed within the job. ``None`` for jobs that "
"have not started yet or that are in a terminal state."
),
)
frames_processed: int | None = Field(
default=None,
ge=0,
description="Frames processed in the current video.",
)
frames_total: int | None = Field(
default=None,
ge=0,
description=(
"OpenCV's ``CAP_PROP_FRAME_COUNT`` hint for the current "
"video. Unreliable for variable-rate videos; callers that "
"rely on this for ETA should treat it as an estimate."
),
)
videos_completed: int | None = Field(
default=None,
ge=0,
description="Number of videos finished in this job.",
)
videos_total: int | None = Field(
default=None,
ge=0,
description="Total number of videos in this job.",
)
percent_complete: float | None = Field(
default=None,
ge=0.0,
le=100.0,
description=(
"Overall job progress in ``[0.0, 100.0]``, computed across "
"all videos in the job. ``None`` for entries that have no "
"progress data yet (recently-created or pre-monitor)."
),
)
last_update: datetime | None = Field(
default=None,
description=(
"Wall-clock time of the most recent progress checkpoint. "
"Compared against ``datetime.now(UTC)`` by the monitor to "
"flag stale entries — a ``processing`` entry whose "
"``last_update`` is several minutes old may indicate a "
"wedged daemon."
),
)
class StatusFile(RootModel[dict[str, JobStatusEntry]]):

447
src/neuropose/monitor.py Normal file
View File

@ -0,0 +1,447 @@
"""Localhost HTTP status monitor for the NeuroPose daemon.
Serves a small dashboard that collaborators on the same machine can
visit in a browser to watch a batch run make progress. The page reads
the persistent :class:`~neuropose.io.StatusFile` on every request no
new state, no in-memory cache, no sync protocol with the daemon so a
``neuropose serve`` process is trivially safe to run alongside
``neuropose watch`` and stays useful even if the daemon is down.
Two URLs are exposed:
- ``GET /`` returns a plain HTML page with a progress bar per job,
an auto-refresh ``<meta>`` tag, and a stale-entry warning when a
processing job hasn't checkpointed in a while.
- ``GET /status.json`` returns the raw validated
:class:`~neuropose.io.StatusFile` as JSON, so any collaborator with
``curl`` (or a scripted pipeline) can consume the same data the
browser sees. ``?job=<name>`` filters to a single entry.
- ``GET /health`` is a simple ``200 OK`` so external uptime
checks can tell that the server process is running without parsing
the HTML.
By default the server binds to ``127.0.0.1:8765``. It is **not**
exposed on any external interface by default collaborators on the
same machine reach it directly, and collaborators elsewhere should go
through an SSH tunnel or explicitly configured reverse proxy. Binding
to ``0.0.0.0`` requires an explicit ``--host`` override, because
that is a real network-exposure decision the operator should make
with eyes open.
Dependencies
------------
Pure stdlib: :mod:`http.server`, :mod:`json`, :mod:`datetime`. No
FastAPI, no Flask, no tornado this is a localhost tool and the
cost of a framework is not justified. Keeping it in stdlib also
means the monitor has zero runtime dependency surface that could
conflict with the rest of the project's pin.
"""
from __future__ import annotations
import html
import json
import logging
from datetime import UTC, datetime
from http import HTTPStatus
from http.server import BaseHTTPRequestHandler, HTTPServer
from pathlib import Path
from typing import Any
from urllib.parse import parse_qs, urlsplit
from neuropose.io import JobStatus, JobStatusEntry, StatusFile, load_status
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Tunables
# ---------------------------------------------------------------------------
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8765
HTML_REFRESH_SECONDS = 5
"""How often the HTML page auto-refreshes via ``<meta http-equiv>``."""
STALE_THRESHOLD_SECONDS = 60
"""A ``processing`` entry with ``last_update`` older than this many
seconds is flagged as stale in the HTML. 60 s is 20x the default
checkpoint cadence at 10 fps inference, so anything beyond it is very
likely a wedged daemon rather than normal jitter."""
# ---------------------------------------------------------------------------
# HTML rendering
# ---------------------------------------------------------------------------
_HTML_TEMPLATE = """\
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="refresh" content="{refresh}">
<title>NeuroPose job status</title>
<style>
body {{
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI",
"Helvetica Neue", Arial, sans-serif;
max-width: 960px;
margin: 2em auto;
padding: 0 1em;
color: #1a1a1a;
}}
h1 {{ margin-bottom: 0.25em; }}
.subtitle {{ color: #555; margin-top: 0; font-size: 0.9em; }}
.empty {{
padding: 2em;
background: #f6f6f6;
border-radius: 8px;
text-align: center;
color: #555;
}}
table {{
width: 100%;
border-collapse: collapse;
margin-top: 1em;
}}
th, td {{
text-align: left;
padding: 0.6em 0.8em;
border-bottom: 1px solid #eee;
vertical-align: top;
}}
th {{
font-weight: 600;
background: #fafafa;
border-bottom: 2px solid #ccc;
}}
progress {{ width: 100%; height: 1em; }}
.status-processing {{ color: #1765bd; font-weight: 600; }}
.status-completed {{ color: #2a8f3a; font-weight: 600; }}
.status-failed {{ color: #b93232; font-weight: 600; }}
.stale {{
display: inline-block;
background: #fff3cd;
color: #8a6d1b;
padding: 0 0.4em;
border-radius: 4px;
font-size: 0.8em;
margin-left: 0.4em;
}}
.error-msg {{
font-family: ui-monospace, "SF Mono", Menlo, Consolas, monospace;
font-size: 0.85em;
color: #b93232;
white-space: pre-wrap;
word-break: break-word;
}}
.muted {{ color: #888; font-size: 0.85em; }}
</style>
</head>
<body>
<h1>NeuroPose job status</h1>
<p class="subtitle">
Reading {status_path} &middot;
as of {now_iso} &middot;
auto-refresh every {refresh} s &middot;
<a href="/status.json">status.json</a>
</p>
{body}
</body>
</html>
"""
def render_status_html(status: StatusFile, *, status_path: Path, now: datetime) -> str:
"""Render the full HTML dashboard for ``status``."""
if status.is_empty():
body = (
'<div class="empty">No jobs tracked yet. '
"Ingest a zip archive or drop a job directory into "
"<code>$data_dir/in/</code> to get started.</div>"
)
else:
rows = "\n".join(_render_row(name, entry, now=now) for name, entry in status.root.items())
body = (
"<table>\n"
"<thead><tr>"
"<th>Job</th>"
"<th>Status</th>"
"<th>Progress</th>"
"<th>Current video</th>"
"<th>Started</th>"
"<th>Last update</th>"
"</tr></thead>\n"
f"<tbody>\n{rows}\n</tbody>\n"
"</table>\n"
)
return _HTML_TEMPLATE.format(
refresh=HTML_REFRESH_SECONDS,
status_path=html.escape(str(status_path)),
now_iso=html.escape(now.isoformat(timespec="seconds")),
body=body,
)
def _render_row(name: str, entry: JobStatusEntry, *, now: datetime) -> str:
"""Render one ``<tr>`` for a single job entry."""
status_class = f"status-{entry.status.value}"
status_cell = f'<span class="{status_class}">{html.escape(entry.status.value)}</span>'
stale_badge = ""
if entry.status == JobStatus.PROCESSING and entry.last_update is not None:
age_seconds = (now - entry.last_update).total_seconds()
if age_seconds > STALE_THRESHOLD_SECONDS:
stale_badge = f'<span class="stale">stale — no update for {int(age_seconds)} s</span>'
status_cell += stale_badge
progress_cell = _render_progress_cell(entry)
current_video = html.escape(entry.current_video or "")
if entry.videos_total is not None and entry.videos_completed is not None:
current_video += (
f' <span class="muted">({entry.videos_completed}/'
f"{entry.videos_total} videos done)</span>"
)
started_cell = html.escape(entry.started_at.isoformat(timespec="seconds"))
last_update_cell = (
html.escape(entry.last_update.isoformat(timespec="seconds"))
if entry.last_update is not None
else '<span class="muted">—</span>'
)
error_row = ""
if entry.status == JobStatus.FAILED and entry.error:
error_row = (
f'<tr><td colspan="6" class="error-msg">error: {html.escape(entry.error)}</td></tr>'
)
return (
f"<tr>"
f"<td>{html.escape(name)}</td>"
f"<td>{status_cell}</td>"
f"<td>{progress_cell}</td>"
f"<td>{current_video}</td>"
f"<td>{started_cell}</td>"
f"<td>{last_update_cell}</td>"
f"</tr>"
f"{error_row}"
)
def _render_progress_cell(entry: JobStatusEntry) -> str:
"""Render the progress-bar cell for one job."""
if entry.status == JobStatus.COMPLETED:
return '<progress value="100" max="100"></progress> 100%'
if entry.status == JobStatus.FAILED:
return '<span class="muted">—</span>'
if entry.percent_complete is None:
return '<progress max="100"></progress> <span class="muted">starting…</span>'
pct = entry.percent_complete
detail = ""
if entry.frames_processed is not None and entry.frames_total:
detail = (
f' <span class="muted">'
f"({entry.frames_processed}/{entry.frames_total} frames"
f"{_render_eta(entry)})</span>"
)
return f'<progress value="{pct:.1f}" max="100"></progress> {pct:.1f}%{detail}'
def _render_eta(entry: JobStatusEntry) -> str:
"""Return an ``, ETA ~XYZ s`` suffix if ETA can be computed."""
if entry.started_at is None or entry.percent_complete is None:
return ""
if entry.percent_complete <= 0.0 or entry.percent_complete >= 100.0:
return ""
now = datetime.now(UTC)
elapsed = (now - entry.started_at).total_seconds()
if elapsed <= 0.0:
return ""
fraction = entry.percent_complete / 100.0
total_estimated = elapsed / fraction
remaining = max(0.0, total_estimated - elapsed)
return f", ETA ~{_format_duration(remaining)}"
def _format_duration(seconds: float) -> str:
"""Format a duration in seconds as ``HH:MM:SS`` or ``MM:SS`` or ``SS s``."""
seconds_int = round(seconds)
if seconds_int < 60:
return f"{seconds_int} s"
if seconds_int < 3600:
m, s = divmod(seconds_int, 60)
return f"{m}:{s:02d}"
h, rem = divmod(seconds_int, 3600)
m, s = divmod(rem, 60)
return f"{h}:{m:02d}:{s:02d}"
# ---------------------------------------------------------------------------
# Request handler
# ---------------------------------------------------------------------------
class _StatusRequestHandler(BaseHTTPRequestHandler):
"""HTTP handler that serves the status dashboard and JSON.
The status file path is injected via the server's ``status_path``
attribute (see :func:`build_server`). Using a subclass + instance
attribute avoids a module-level global and keeps the handler
trivially parametrisable for tests.
"""
# Silence the stdlib server's default stderr access log; route it
# through the package logger instead so operators can tune it like
# any other neuropose module.
def log_message(self, format: str, *args: Any) -> None:
logger.debug("%s - - %s", self.address_string(), format % args)
def do_GET(self) -> None:
parsed = urlsplit(self.path)
path = parsed.path
if path == "/":
self._serve_html()
elif path == "/status.json":
self._serve_json(parse_qs(parsed.query))
elif path == "/health":
self._serve_health()
else:
self.send_error(HTTPStatus.NOT_FOUND, "unknown path")
def _serve_html(self) -> None:
status = self._load_status()
body = render_status_html(
status,
status_path=self.server.status_path, # type: ignore[attr-defined]
now=datetime.now(UTC),
).encode("utf-8")
self.send_response(HTTPStatus.OK)
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.send_header("Cache-Control", "no-store")
self.end_headers()
self.wfile.write(body)
def _serve_json(self, query: dict[str, list[str]]) -> None:
status = self._load_status()
payload: Any = status.model_dump(mode="json")
job_filter = query.get("job", [None])[0]
if job_filter is not None:
if job_filter not in status.root:
self.send_error(
HTTPStatus.NOT_FOUND,
f"no such job: {job_filter}",
)
return
payload = status.root[job_filter].model_dump(mode="json")
body = json.dumps(payload, indent=2).encode("utf-8")
self.send_response(HTTPStatus.OK)
self.send_header("Content-Type", "application/json; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.send_header("Cache-Control", "no-store")
self.end_headers()
self.wfile.write(body)
def _serve_health(self) -> None:
body = b'{"status":"ok"}'
self.send_response(HTTPStatus.OK)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def _load_status(self) -> StatusFile:
"""Load the current status file.
The file is read on every request no in-memory cache which
is cheap in absolute terms (the file is small) and means the
monitor never serves stale data relative to the daemon's most
recent atomic write.
"""
path: Path = self.server.status_path # type: ignore[attr-defined]
return load_status(path)
# ---------------------------------------------------------------------------
# Server construction and lifecycle
# ---------------------------------------------------------------------------
class _StatusServer(HTTPServer):
"""HTTPServer subclass that carries the status file path.
The path is needed inside every request handler, and the stdlib
server lets subclasses add arbitrary attributes cleaner than a
module-level global or a closure-captured handler class.
"""
status_path: Path
def __init__(
self,
server_address: tuple[str, int],
status_path: Path,
) -> None:
super().__init__(server_address, _StatusRequestHandler)
self.status_path = status_path
def build_server(
status_path: Path,
*,
host: str = DEFAULT_HOST,
port: int = DEFAULT_PORT,
) -> _StatusServer:
"""Construct (but do not start) a status monitor HTTP server.
Parameters
----------
status_path
Path to the daemon's ``status.json``. Typically
``Settings.status_file``.
host
Interface to bind. Defaults to ``127.0.0.1`` explicitly
loopback so an unconfigured monitor cannot be reached over the
network without an operator opting in. Pass ``0.0.0.0`` or a
specific external IP only when you have thought through the
exposure.
port
TCP port to listen on. Defaults to 8765.
Returns
-------
_StatusServer
A ready-to-serve HTTP server. Call ``.serve_forever()`` to
block, or ``.handle_request()`` once for tests.
"""
return _StatusServer((host, port), status_path)
def serve_forever(
status_path: Path,
*,
host: str = DEFAULT_HOST,
port: int = DEFAULT_PORT,
) -> None:
"""Start the monitor and block until the process is interrupted.
Logs the bound address once on startup so operators can copy it
into a browser. Catches :class:`KeyboardInterrupt` and shuts the
server down cleanly so Ctrl-C produces a quick, no-traceback exit.
"""
server = build_server(status_path, host=host, port=port)
logger.info(
"NeuroPose monitor listening on http://%s:%d/ (reading %s)",
host,
port,
status_path,
)
try:
server.serve_forever()
except KeyboardInterrupt:
logger.info("Interrupt received; shutting down monitor")
finally:
server.server_close()

View File

@ -92,6 +92,8 @@ class TestTopLevelOptions:
assert result.exit_code in (EXIT_OK, EXIT_USAGE)
assert "watch" in result.output
assert "process" in result.output
assert "ingest" in result.output
assert "serve" in result.output
assert "segment" in result.output
assert "benchmark" in result.output
assert "analyze" in result.output
@ -102,7 +104,15 @@ class TestTopLevelOptions:
assert "NeuroPose" in result.output
def test_subcommand_help(self, runner: CliRunner) -> None:
for subcommand in ("watch", "process", "segment", "benchmark", "analyze"):
for subcommand in (
"watch",
"process",
"ingest",
"serve",
"segment",
"benchmark",
"analyze",
):
result = runner.invoke(app, [subcommand, "--help"])
assert result.exit_code == EXIT_OK, f"{subcommand} --help failed"
@ -215,6 +225,158 @@ class TestProcess:
assert "commit 11" in result.output
# ---------------------------------------------------------------------------
# ingest
# ---------------------------------------------------------------------------
def _write_zip_for_cli(path: Path, members: dict[str, bytes]) -> Path:
"""Build a zip fixture for the CLI ingest tests."""
import zipfile
with zipfile.ZipFile(path, "w") as z:
for name, data in members.items():
z.writestr(name, data)
return path
class TestIngestSubcommand:
def test_ingest_happy_path_creates_job_dirs(
self,
runner: CliRunner,
tmp_path: Path,
xdg_home: Path,
) -> None:
archive = _write_zip_for_cli(
tmp_path / "session.zip",
{"clip_01.mp4": b"video one", "clip_02.mp4": b"video two"},
)
result = runner.invoke(app, ["ingest", str(archive)])
assert result.exit_code == EXIT_OK, result.output
assert "ingested 2 job(s)" in result.output
# Default Settings.data_dir is $XDG_DATA_HOME/neuropose/jobs, and
# input_dir = data_dir/in.
input_dir = xdg_home / "neuropose" / "jobs" / "in"
assert (input_dir / "clip_01" / "clip_01.mp4").read_bytes() == b"video one"
assert (input_dir / "clip_02" / "clip_02.mp4").read_bytes() == b"video two"
def test_ingest_reports_skipped_non_videos(
self,
runner: CliRunner,
tmp_path: Path,
) -> None:
archive = _write_zip_for_cli(
tmp_path / "a.zip",
{"clip.mp4": b"v", "README.md": b"readme"},
)
result = runner.invoke(app, ["ingest", str(archive)])
assert result.exit_code == EXIT_OK, result.output
assert "1 non-video member(s) skipped" in result.output
def test_ingest_missing_archive_is_usage_error(
self,
runner: CliRunner,
tmp_path: Path,
) -> None:
result = runner.invoke(app, ["ingest", str(tmp_path / "nope.zip")])
assert result.exit_code == EXIT_USAGE
def test_ingest_empty_archive_is_usage_error(
self,
runner: CliRunner,
tmp_path: Path,
) -> None:
archive = _write_zip_for_cli(tmp_path / "empty.zip", {})
result = runner.invoke(app, ["ingest", str(archive)])
assert result.exit_code == EXIT_USAGE
assert "no video files" in result.output.lower()
def test_ingest_collision_without_force_is_usage_error(
self,
runner: CliRunner,
tmp_path: Path,
xdg_home: Path,
) -> None:
archive = _write_zip_for_cli(tmp_path / "a.zip", {"clip.mp4": b"v"})
# Pre-create the colliding job directory so the first run has
# something to collide with.
runner.invoke(app, ["ingest", str(archive)])
result = runner.invoke(app, ["ingest", str(archive)])
assert result.exit_code == EXIT_USAGE
assert "already exist" in result.output.lower()
assert "--force" in result.output
def test_ingest_force_overwrites(
self,
runner: CliRunner,
tmp_path: Path,
xdg_home: Path,
) -> None:
archive = _write_zip_for_cli(tmp_path / "a.zip", {"clip.mp4": b"original"})
first = runner.invoke(app, ["ingest", str(archive)])
assert first.exit_code == EXIT_OK, first.output
# Build a new archive with the same job name but different bytes.
archive2 = _write_zip_for_cli(tmp_path / "b.zip", {"clip.mp4": b"overwritten"})
second = runner.invoke(app, ["ingest", str(archive2), "--force"])
assert second.exit_code == EXIT_OK, second.output
input_dir = xdg_home / "neuropose" / "jobs" / "in"
assert (input_dir / "clip" / "clip.mp4").read_bytes() == b"overwritten"
def test_ingest_traversal_rejected_as_usage_error(
self,
runner: CliRunner,
tmp_path: Path,
) -> None:
archive = _write_zip_for_cli(tmp_path / "a.zip", {"../escape.mp4": b"v"})
result = runner.invoke(app, ["ingest", str(archive)])
assert result.exit_code == EXIT_USAGE
assert "traversal" in result.output.lower()
# ---------------------------------------------------------------------------
# serve
# ---------------------------------------------------------------------------
class TestServeSubcommand:
def test_serve_exits_cleanly_on_keyboard_interrupt(
self,
runner: CliRunner,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""``neuropose serve`` should translate Ctrl-C into EXIT_INTERRUPTED.
We patch ``serve_forever`` to raise ``KeyboardInterrupt``
immediately, which simulates a Ctrl-C before any request is
served. The CLI's handler should map that to the standard
shell-interruption exit code.
"""
def fake_serve_forever(status_path, *, host, port) -> None:
del status_path, host, port
raise KeyboardInterrupt
monkeypatch.setattr("neuropose.monitor.serve_forever", fake_serve_forever)
result = runner.invoke(app, ["serve"])
assert result.exit_code == EXIT_INTERRUPTED
def test_serve_bind_error_is_usage_error(
self,
runner: CliRunner,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""A port-already-in-use OSError should surface as EXIT_USAGE."""
def fake_serve_forever(status_path, *, host, port) -> None:
del status_path, host, port
raise OSError("address already in use")
monkeypatch.setattr("neuropose.monitor.serve_forever", fake_serve_forever)
result = runner.invoke(app, ["serve"])
assert result.exit_code == EXIT_USAGE
assert "could not bind" in result.output.lower()
# ---------------------------------------------------------------------------
# segment
# ---------------------------------------------------------------------------

292
tests/unit/test_ingest.py Normal file
View File

@ -0,0 +1,292 @@
"""Tests for :mod:`neuropose.ingest`.
Coverage:
- Happy path nested and top-level videos produce one job each.
- Job-name derivation flattening, sanitization, collapsing.
- Non-video members are skipped, not errors.
- Zip-internal collisions (two videos same job name) reported up
front.
- External collisions (target job dir already exists) are listed in
one error; ``--force`` deletes and replaces.
- Security: path-traversal and absolute-path members refused; empty
archive and oversize archive refused.
- Atomicity: when extraction fails midway, no partial state is left
behind under ``input_dir``.
"""
from __future__ import annotations
import zipfile
from pathlib import Path
import pytest
from neuropose.ingest import (
ArchiveEmptyError,
ArchiveInvalidError,
ArchiveTooLargeError,
IngestResult,
JobCollisionError,
ingest_zip,
)
def _write_zip(path: Path, members: dict[str, bytes]) -> Path:
"""Create a zip at ``path`` with the given ``{name: bytes}`` members."""
with zipfile.ZipFile(path, "w") as z:
for name, data in members.items():
z.writestr(name, data)
return path
@pytest.fixture
def input_dir(tmp_path: Path) -> Path:
"""Return a fresh ``input_dir`` for the test."""
d = tmp_path / "jobs" / "in"
d.mkdir(parents=True)
return d
# ---------------------------------------------------------------------------
# Happy path
# ---------------------------------------------------------------------------
class TestHappyPath:
def test_top_level_video_becomes_job(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"clip_01.mp4": b"data"})
result = ingest_zip(archive, input_dir)
assert result.job_count == 1
assert result.ingested[0].job_name == "clip_01"
assert (input_dir / "clip_01" / "clip_01.mp4").read_bytes() == b"data"
def test_nested_path_flattens_into_job_name(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{"patient_001/trial_01.mp4": b"vid"},
)
result = ingest_zip(archive, input_dir)
job = result.ingested[0]
assert job.job_name == "patient_001_trial_01"
# The video file inside the job dir keeps its basename, not the
# flattened job name, so the daemon sees a clean filename.
assert job.video_filename == "trial_01.mp4"
assert (input_dir / "patient_001_trial_01" / "trial_01.mp4").exists()
def test_sibling_nested_videos_do_not_collide(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{
"patient_001/trial_01.mp4": b"a",
"patient_002/trial_01.mp4": b"b",
},
)
result = ingest_zip(archive, input_dir)
names = {j.job_name for j in result.ingested}
assert names == {"patient_001_trial_01", "patient_002_trial_01"}
def test_multiple_videos_produce_multiple_jobs(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{f"clip_{i:02d}.mp4": f"d{i}".encode() for i in range(5)},
)
result = ingest_zip(archive, input_dir)
assert result.job_count == 5
assert len(list(input_dir.iterdir())) == 5
def test_non_video_members_skipped(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{
"clip.mp4": b"video",
"README.md": b"notes",
".DS_Store": b"junk",
"notes.txt": b"notes",
},
)
result = ingest_zip(archive, input_dir)
assert result.job_count == 1
assert sorted(result.skipped_non_videos) == sorted([".DS_Store", "README.md", "notes.txt"])
def test_all_accepted_extensions(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{
"a.mp4": b"a",
"b.avi": b"b",
"c.mov": b"c",
"d.mkv": b"d",
},
)
result = ingest_zip(archive, input_dir)
assert result.job_count == 4
def test_returns_typed_result(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"data"})
result = ingest_zip(archive, input_dir)
assert isinstance(result, IngestResult)
assert result.total_uncompressed_bytes == 4
# ---------------------------------------------------------------------------
# Job-name sanitization
# ---------------------------------------------------------------------------
class TestJobNameDerivation:
def test_special_chars_become_underscores(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{"session 2026-04-15 / trial @1.mp4": b"v"},
)
result = ingest_zip(archive, input_dir)
name = result.ingested[0].job_name
# Every character ends up in the safe set; runs of underscores
# are collapsed and leading/trailing stripped.
assert name == "session_2026-04-15_trial_1"
def test_all_symbol_name_falls_back_to_video(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"!!!.mp4": b"v"})
result = ingest_zip(archive, input_dir)
assert result.ingested[0].job_name == "video"
# ---------------------------------------------------------------------------
# Collision detection
# ---------------------------------------------------------------------------
class TestCollisions:
def test_zip_internal_collision_rejects(self, tmp_path: Path, input_dir: Path) -> None:
# Both entries flatten to the same job name because their
# stems are the same and both are top-level after derivation.
archive = _write_zip(
tmp_path / "a.zip",
{"a/b.mp4": b"x", "a/b.mp4.bak": b"y"},
)
# The second one is non-video (.bak suffix), so this is
# actually a happy case. Build a real collision:
archive = _write_zip(
tmp_path / "b.zip",
{"x__y.mp4": b"1", "x y.mp4": b"2"},
)
with pytest.raises(JobCollisionError):
ingest_zip(archive, input_dir)
# No files written.
assert list(input_dir.iterdir()) == []
def test_external_collision_without_force(self, tmp_path: Path, input_dir: Path) -> None:
(input_dir / "clip").mkdir()
(input_dir / "clip" / "existing.mp4").write_bytes(b"old")
archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"new"})
with pytest.raises(JobCollisionError) as excinfo:
ingest_zip(archive, input_dir)
assert excinfo.value.collisions == ["clip"]
# Existing job dir is untouched.
assert (input_dir / "clip" / "existing.mp4").read_bytes() == b"old"
def test_external_collision_listed_together(self, tmp_path: Path, input_dir: Path) -> None:
for name in ("a", "b", "c"):
(input_dir / name).mkdir()
archive = _write_zip(
tmp_path / "a.zip",
{"a.mp4": b"1", "b.mp4": b"2", "c.mp4": b"3", "d.mp4": b"4"},
)
with pytest.raises(JobCollisionError) as excinfo:
ingest_zip(archive, input_dir)
assert sorted(excinfo.value.collisions) == ["a", "b", "c"]
def test_force_overwrites_existing(self, tmp_path: Path, input_dir: Path) -> None:
(input_dir / "clip").mkdir()
(input_dir / "clip" / "existing.mp4").write_bytes(b"old")
archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"new"})
result = ingest_zip(archive, input_dir, force=True)
assert result.job_count == 1
# The old file is gone; only the new one remains.
files = list((input_dir / "clip").iterdir())
assert [f.name for f in files] == ["clip.mp4"]
assert (input_dir / "clip" / "clip.mp4").read_bytes() == b"new"
# ---------------------------------------------------------------------------
# Security
# ---------------------------------------------------------------------------
class TestSecurity:
def test_absolute_path_member_rejected(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"/etc/passwd.mp4": b"x"})
with pytest.raises(ArchiveInvalidError, match="absolute"):
ingest_zip(archive, input_dir)
assert list(input_dir.iterdir()) == []
def test_traversal_member_rejected(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"../escape.mp4": b"x"})
with pytest.raises(ArchiveInvalidError, match="traversal"):
ingest_zip(archive, input_dir)
assert list(input_dir.iterdir()) == []
def test_empty_archive_rejected(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {})
with pytest.raises(ArchiveEmptyError):
ingest_zip(archive, input_dir)
def test_archive_with_only_non_videos_rejected(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(
tmp_path / "a.zip",
{"README.md": b"no videos here", "notes.txt": b"none"},
)
with pytest.raises(ArchiveEmptyError):
ingest_zip(archive, input_dir)
def test_too_large_archive_rejected(
self,
tmp_path: Path,
input_dir: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Lower the cap for the test rather than building a real
# multi-GB zip. The enforcement path is the same.
monkeypatch.setattr("neuropose.ingest.MAX_UNCOMPRESSED_BYTES", 10)
archive = _write_zip(
tmp_path / "a.zip",
{"clip.mp4": b"0123456789ABCDEF"}, # 16 bytes > 10
)
with pytest.raises(ArchiveTooLargeError):
ingest_zip(archive, input_dir)
assert list(input_dir.iterdir()) == []
def test_bad_zip_file_rejected(self, tmp_path: Path, input_dir: Path) -> None:
bad = tmp_path / "bad.zip"
bad.write_bytes(b"this is not a valid zip file at all")
with pytest.raises(ArchiveInvalidError):
ingest_zip(bad, input_dir)
def test_missing_archive_raises(self, tmp_path: Path, input_dir: Path) -> None:
with pytest.raises(FileNotFoundError):
ingest_zip(tmp_path / "nope.zip", input_dir)
# ---------------------------------------------------------------------------
# Atomicity
# ---------------------------------------------------------------------------
class TestAtomicity:
def test_staging_directory_cleaned_up_on_success(self, tmp_path: Path, input_dir: Path) -> None:
archive = _write_zip(tmp_path / "a.zip", {"clip.mp4": b"v"})
ingest_zip(archive, input_dir)
# No stray `.ingest_*` directories left under the parent.
leftover = [p for p in input_dir.parent.iterdir() if p.name.startswith(".ingest_")]
assert leftover == []
def test_no_partial_state_when_planning_fails(self, tmp_path: Path, input_dir: Path) -> None:
# An archive that will pass the zipfile open but fail at
# planning (traversal member) should never write to input_dir.
archive = _write_zip(tmp_path / "a.zip", {"../bad.mp4": b"v"})
with pytest.raises(ArchiveInvalidError):
ingest_zip(archive, input_dir)
assert list(input_dir.iterdir()) == []
leftover = [p for p in input_dir.parent.iterdir() if p.name.startswith(".ingest_")]
assert leftover == []

View File

@ -45,8 +45,14 @@ class _RaisingEstimator:
def load_model(self, cache_dir: Path | None = None) -> None:
del cache_dir
def process_video(self, video_path: Path) -> Any:
del video_path
def process_video(
self,
video_path: Path,
*,
progress: Any = None,
**_: Any,
) -> Any:
del video_path, progress
raise self._exc
@ -249,6 +255,142 @@ class TestProcessJobSuccess:
assert not (settings.failed_dir / "job_a").exists()
class TestProgressCheckpointing:
def test_completed_entry_has_full_progress(
self,
tmp_path: Path,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
settings = _make_settings(tmp_path)
_prepare_job(settings, "job_a", videos=[synthetic_video])
interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
entry = interfacer.process_job("job_a")
assert entry.percent_complete == 100.0
assert entry.videos_completed == 1
assert entry.videos_total == 1
assert entry.last_update is not None
def test_checkpoints_persist_to_status_file(
self,
tmp_path: Path,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
# Force a low checkpoint cadence so the 5-frame synthetic video
# actually triggers the callback a few times. Without this the
# default cadence of 30 would not fire on a 5-frame video.
settings = Settings(
data_dir=tmp_path / "jobs",
model_cache_dir=tmp_path / "models",
status_checkpoint_every_frames=1,
)
_prepare_job(settings, "job_a", videos=[synthetic_video])
interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
interfacer.process_job("job_a")
status = load_status(settings.status_file)
entry = status.root["job_a"]
# Final entry is COMPLETED with full progress.
assert entry.status == JobStatus.COMPLETED
assert entry.percent_complete == 100.0
def test_seed_checkpoint_sets_videos_total_before_first_callback(
self,
tmp_path: Path,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
"""The seeded checkpoint should land even if no frame callbacks fire.
We wrap the real estimator so its ``process_video`` records
the ``status.json`` state at the moment it is called after
the seed checkpoint but before any frame-based callbacks run
and assert that ``videos_total`` and ``current_video`` were
already written by the seed.
"""
settings = _make_settings(tmp_path)
_prepare_job(settings, "job_a", videos=[synthetic_video])
seen_entries: list[JobStatusEntry] = []
real_estimator = Estimator(model=fake_metrabs_model)
class RecordingEstimator:
is_model_loaded = True
def load_model(self, cache_dir: Path | None = None) -> None:
del cache_dir
def process_video(
self,
video_path: Path,
*,
progress: Any = None,
**_: Any,
) -> Any:
status = load_status(settings.status_file)
seen_entries.append(status.root["job_a"])
return real_estimator.process_video(video_path, progress=progress)
interfacer = Interfacer(settings, RecordingEstimator()) # type: ignore[arg-type]
interfacer.process_job("job_a")
assert len(seen_entries) == 1
seeded = seen_entries[0]
assert seeded.videos_total == 1
assert seeded.videos_completed == 0
assert seeded.current_video == synthetic_video.name
def test_checkpoint_progress_swallows_io_errors(
self,
tmp_path: Path,
fake_metrabs_model,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""``_checkpoint_progress`` is best-effort — must not raise.
We call it directly and patch the underlying ``save_status``
to raise, then assert the call returns normally. Testing the
helper in isolation is more robust than trying to inject a
failure at the exact call-ordinal during a full ``process_job``
run.
"""
settings = _make_settings(tmp_path)
settings.ensure_dirs()
# Seed a PROCESSING entry so _checkpoint_progress has something
# to update.
status = StatusFile(
root={
"job_a": JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=datetime.now(UTC),
)
}
)
save_status(settings.status_file, status)
interfacer = Interfacer(settings, Estimator(model=fake_metrabs_model))
def broken_save(path: Path, status: StatusFile) -> None:
raise OSError("disk full, simulated")
monkeypatch.setattr("neuropose.interfacer.save_status", broken_save)
# The helper must not raise — the caller (inference loop)
# continues making forward progress even if the write fails.
interfacer._checkpoint_progress(
"job_a",
started_at=datetime.now(UTC),
current_video="v.mp4",
frames_processed=10,
frames_total=100,
videos_completed=0,
videos_total=1,
)
# ---------------------------------------------------------------------------
# process_job failure paths
# ---------------------------------------------------------------------------

View File

@ -16,6 +16,7 @@ from neuropose.io import (
FramePrediction,
JobResults,
JobStatus,
JobStatusEntry,
JointAngleExtractor,
JointAxisExtractor,
JointPairDistanceExtractor,
@ -525,3 +526,86 @@ class TestStatusFile:
}
}
)
class TestJobStatusEntryProgressFields:
def test_progress_fields_default_to_none(self) -> None:
entry = JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=datetime(2026, 4, 13, tzinfo=UTC),
)
assert entry.current_video is None
assert entry.frames_processed is None
assert entry.frames_total is None
assert entry.videos_completed is None
assert entry.videos_total is None
assert entry.percent_complete is None
assert entry.last_update is None
def test_legacy_status_file_without_progress_loads(self, tmp_path: Path) -> None:
"""Files written before the progress fields existed must still load."""
started = datetime(2026, 4, 13, 12, 0, 0, tzinfo=UTC)
path = tmp_path / "legacy.json"
path.write_text(
json.dumps(
{
"job_001": {
"status": "completed",
"started_at": started.isoformat(),
"completed_at": started.isoformat(),
"results_path": "/tmp/results.json",
"error": None,
}
}
)
)
loaded = load_status(path)
entry = loaded.root["job_001"]
assert entry.percent_complete is None
def test_progress_roundtrips_through_json(self, tmp_path: Path) -> None:
now = datetime(2026, 4, 13, 12, 0, 0, tzinfo=UTC)
status = StatusFile(
root={
"job_001": JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=now,
current_video="trial_01.mp4",
frames_processed=450,
frames_total=1200,
videos_completed=0,
videos_total=3,
percent_complete=12.5,
last_update=now,
),
}
)
path = tmp_path / "status.json"
save_status(path, status)
loaded = load_status(path)
entry = loaded.root["job_001"]
assert entry.current_video == "trial_01.mp4"
assert entry.frames_processed == 450
assert entry.percent_complete == 12.5
def test_percent_complete_rejects_out_of_range(self) -> None:
with pytest.raises(ValidationError):
JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=datetime(2026, 4, 13, tzinfo=UTC),
percent_complete=150.0,
)
with pytest.raises(ValidationError):
JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=datetime(2026, 4, 13, tzinfo=UTC),
percent_complete=-5.0,
)
def test_frames_processed_rejects_negative(self) -> None:
with pytest.raises(ValidationError):
JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=datetime(2026, 4, 13, tzinfo=UTC),
frames_processed=-1,
)

346
tests/unit/test_monitor.py Normal file
View File

@ -0,0 +1,346 @@
"""Tests for :mod:`neuropose.monitor`.
The tests boot the real :class:`http.server.HTTPServer` subclass on a
free ephemeral port in a background thread, issue real HTTP requests
with :mod:`urllib.request`, and assert on the responses. This
exercises the handler, routing, JSON serialization, HTML rendering,
and the query-parameter filter end-to-end exactly the surface
collaborators will hit.
"""
from __future__ import annotations
import json
import socket
import threading
import time
import urllib.error
import urllib.request
from datetime import UTC, datetime, timedelta
from http.server import HTTPServer
from pathlib import Path
import pytest
from neuropose.io import (
JobStatus,
JobStatusEntry,
StatusFile,
save_status,
)
from neuropose.monitor import (
STALE_THRESHOLD_SECONDS,
build_server,
render_status_html,
serve_forever,
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _free_port() -> int:
"""Grab an unused TCP port and release it for the next caller.
``HTTPServer`` does not offer a first-class "bind to an ephemeral
port" API that also exposes the chosen port to the caller before
serving. Doing a separate SO_REUSEADDR probe to pick one is
reliable enough for a test-only fixture.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(("127.0.0.1", 0))
return int(s.getsockname()[1])
@pytest.fixture
def status_path(tmp_path: Path) -> Path:
"""Return a path to a populated status.json for the monitor tests."""
path = tmp_path / "status.json"
now = datetime.now(UTC)
status = StatusFile(
root={
"job_running": JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=now - timedelta(minutes=2),
current_video="trial_01.mp4",
frames_processed=450,
frames_total=1200,
videos_completed=0,
videos_total=3,
percent_complete=12.5,
last_update=now,
),
"job_done": JobStatusEntry(
status=JobStatus.COMPLETED,
started_at=now - timedelta(minutes=10),
completed_at=now - timedelta(minutes=5),
results_path=Path("/tmp/out/job_done/results.json"),
videos_completed=2,
videos_total=2,
percent_complete=100.0,
last_update=now - timedelta(minutes=5),
),
"job_dead": JobStatusEntry(
status=JobStatus.FAILED,
started_at=now - timedelta(minutes=6),
completed_at=now - timedelta(minutes=5),
error="VideoDecodeError: OpenCV could not open video",
),
}
)
save_status(path, status)
return path
@pytest.fixture
def running_server(status_path: Path):
"""Boot the monitor in a background thread, yield the base URL."""
port = _free_port()
server = build_server(status_path, host="127.0.0.1", port=port)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
# Tiny settle delay so the first request doesn't race with
# serve_forever's setup. The stdlib server is synchronous enough
# that a few ms is plenty.
time.sleep(0.05)
try:
yield f"http://127.0.0.1:{port}"
finally:
server.shutdown()
server.server_close()
thread.join(timeout=2)
def _get(url: str, *, timeout: float = 2.0) -> tuple[int, bytes, dict[str, str]]:
try:
with urllib.request.urlopen(url, timeout=timeout) as resp:
return resp.status, resp.read(), dict(resp.headers)
except urllib.error.HTTPError as exc:
return exc.code, exc.read(), dict(exc.headers) if exc.headers else {}
# ---------------------------------------------------------------------------
# HTML rendering (pure function, no server)
# ---------------------------------------------------------------------------
class TestRenderStatusHtml:
def test_empty_status_shows_empty_state(self, tmp_path: Path) -> None:
html_text = render_status_html(
StatusFile(root={}),
status_path=tmp_path / "status.json",
now=datetime.now(UTC),
)
assert "No jobs tracked yet" in html_text
assert "auto-refresh" in html_text
def test_processing_entry_renders_progress_bar(self, status_path: Path) -> None:
from neuropose.io import load_status
status = load_status(status_path)
html_text = render_status_html(
status,
status_path=status_path,
now=datetime.now(UTC),
)
assert "job_running" in html_text
assert "<progress" in html_text
assert "12.5" in html_text
assert "trial_01.mp4" in html_text
def test_completed_entry_shows_100_percent(self, status_path: Path) -> None:
from neuropose.io import load_status
status = load_status(status_path)
html_text = render_status_html(
status,
status_path=status_path,
now=datetime.now(UTC),
)
assert "job_done" in html_text
assert "100" in html_text
def test_failed_entry_shows_error_message(self, status_path: Path) -> None:
from neuropose.io import load_status
status = load_status(status_path)
html_text = render_status_html(
status,
status_path=status_path,
now=datetime.now(UTC),
)
assert "job_dead" in html_text
assert "VideoDecodeError" in html_text
def test_error_message_is_html_escaped(self, tmp_path: Path) -> None:
now = datetime.now(UTC)
status = StatusFile(
root={
"x": JobStatusEntry(
status=JobStatus.FAILED,
started_at=now,
completed_at=now,
error="<script>alert('xss')</script>",
)
}
)
html_text = render_status_html(
status,
status_path=tmp_path / "status.json",
now=now,
)
assert "<script>" not in html_text
assert "&lt;script&gt;" in html_text
def test_stale_processing_entry_gets_badge(self, tmp_path: Path) -> None:
now = datetime.now(UTC)
stale_update = now - timedelta(seconds=STALE_THRESHOLD_SECONDS + 30)
status = StatusFile(
root={
"stuck": JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=now - timedelta(hours=1),
current_video="video.mp4",
frames_processed=10,
frames_total=100,
videos_completed=0,
videos_total=1,
percent_complete=10.0,
last_update=stale_update,
),
}
)
html_text = render_status_html(status, status_path=tmp_path / "status.json", now=now)
# The badge text is "stale — no update for X s". The class
# 'stale' also appears in the inline <style> block, so we
# match the human-readable badge text to distinguish.
assert "no update for" in html_text
def test_fresh_processing_entry_has_no_stale_badge(self, tmp_path: Path) -> None:
now = datetime.now(UTC)
status = StatusFile(
root={
"ok": JobStatusEntry(
status=JobStatus.PROCESSING,
started_at=now - timedelta(seconds=5),
current_video="x.mp4",
frames_processed=5,
frames_total=100,
videos_completed=0,
videos_total=1,
percent_complete=5.0,
last_update=now,
),
}
)
html_text = render_status_html(status, status_path=tmp_path / "status.json", now=now)
# Same logic: the CSS class is always present, so we match on
# the badge text which is only injected when the entry is
# actually stale.
assert "no update for" not in html_text
# ---------------------------------------------------------------------------
# HTTP server
# ---------------------------------------------------------------------------
class TestHtmlRoute:
def test_root_returns_html(self, running_server: str) -> None:
status, body, headers = _get(f"{running_server}/")
assert status == 200
assert "text/html" in headers.get("Content-Type", "")
text = body.decode()
assert "NeuroPose job status" in text
assert "job_running" in text
assert "<progress" in text
def test_root_has_no_store_cache_control(self, running_server: str) -> None:
_, _, headers = _get(f"{running_server}/")
assert "no-store" in headers.get("Cache-Control", "")
class TestJsonRoute:
def test_status_json_returns_all_entries(self, running_server: str) -> None:
status, body, headers = _get(f"{running_server}/status.json")
assert status == 200
assert "application/json" in headers.get("Content-Type", "")
data = json.loads(body)
assert set(data.keys()) == {"job_running", "job_done", "job_dead"}
def test_status_json_filter_returns_single_entry(self, running_server: str) -> None:
status, body, _ = _get(f"{running_server}/status.json?job=job_running")
assert status == 200
data = json.loads(body)
assert data["status"] == "processing"
assert data["current_video"] == "trial_01.mp4"
assert data["percent_complete"] == 12.5
def test_status_json_filter_unknown_job_is_404(self, running_server: str) -> None:
status, _, _ = _get(f"{running_server}/status.json?job=nope")
assert status == 404
class TestHealthRoute:
def test_health_returns_ok(self, running_server: str) -> None:
status, body, _ = _get(f"{running_server}/health")
assert status == 200
assert json.loads(body) == {"status": "ok"}
class TestUnknownRoutes:
def test_unknown_path_is_404(self, running_server: str) -> None:
status, _, _ = _get(f"{running_server}/wat")
assert status == 404
# ---------------------------------------------------------------------------
# Monitor survives a missing status file
# ---------------------------------------------------------------------------
class TestMissingStatusFile:
def test_monitor_reads_missing_file_as_empty(self, tmp_path: Path) -> None:
port = _free_port()
path = tmp_path / "status.json" # deliberately not created
server = build_server(path, host="127.0.0.1", port=port)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
time.sleep(0.05)
try:
status, body, _ = _get(f"http://127.0.0.1:{port}/status.json")
assert status == 200
assert json.loads(body) == {}
finally:
server.shutdown()
server.server_close()
thread.join(timeout=2)
# ---------------------------------------------------------------------------
# serve_forever wrapper
# ---------------------------------------------------------------------------
class TestServeForever:
def test_serve_forever_shuts_down_on_keyboard_interrupt(
self,
status_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""``serve_forever`` should catch KeyboardInterrupt and clean up.
We monkeypatch the stdlib HTTPServer's ``serve_forever`` to
immediately raise, which simulates a Ctrl-C before any request
is accepted. The wrapper should swallow it, log, and close the
server.
"""
def fake_serve_forever(self) -> None:
raise KeyboardInterrupt
monkeypatch.setattr(HTTPServer, "serve_forever", fake_serve_forever)
# Should return cleanly, not propagate the KeyboardInterrupt.
serve_forever(status_path, host="127.0.0.1", port=_free_port())