benchmarking and profiling scaffold

This commit is contained in:
Levi Neuwirth 2026-04-14 20:32:29 -04:00
parent b9a5f0d54c
commit d29f4f1b78
19 changed files with 3861 additions and 44 deletions

View File

@ -21,6 +21,8 @@ be split into per-release sections once tagging begins.
and `pyproject.toml` with full project metadata, classifiers, and
URL pointers. The runtime TensorFlow dependency is pinned to
`tensorflow>=2.16,<3.0` — see *Changed* below for the rationale.
`psutil>=5.9` is a runtime dependency used by the estimator's
always-on `PerformanceMetrics` collection to sample peak RSS.
- `[project.optional-dependencies].analysis` extra for fastdtw, scipy,
scikit-learn, and sktime — install via `pip install neuropose[analysis]`.
- `[project.optional-dependencies].metal` extra pulling
@ -82,23 +84,48 @@ be split into per-release sections once tagging begins.
- **`neuropose.io`** — validated prediction schemas:
`FramePrediction` (frozen), `VideoMetadata` (frame count, fps,
width, height), `VideoPredictions` (metadata envelope + frames
mapping), `JobResults`, `JobStatus` enum, `JobStatusEntry` (with a
structured `error` field), and `StatusFile`. Load and save helpers
with an atomic tmp-file-then-rename pattern for every state file.
`load_status` is deliberately crash-resilient: missing, corrupt,
or non-mapping JSON returns an empty `StatusFile` rather than
raising.
mapping + optional `segmentations` field), `JobResults`,
`JobStatus` enum, `JobStatusEntry` (with a structured `error`
field), and `StatusFile`. Performance schema: frozen
`PerformanceMetrics` carrying per-call timings
(`model_load_seconds`, `total_seconds`, `per_frame_latencies_ms`),
`peak_rss_mb`, `active_device`, `tensorflow_metal_active`, and
`tensorflow_version`; `BenchmarkResult` pairing a discarded
`warmup_pass` with `measured_passes` and a `BenchmarkAggregate`
(mean / p50 / p95 / p99 per-frame latency, mean throughput, max
peak RSS); optional `CpuComparisonResult` nested inside
`BenchmarkResult` for `--compare-cpu` runs, carrying both
device aggregates, the throughput speedup, and the
maximum-element-wise `poses3d` divergence in millimetres.
Segmentation schema: frozen `Segment` windows (`start`, `end`,
`peak`), `SegmentationConfig` (with a `method` version literal,
e.g. `valley_to_valley_v1`), a discriminated `ExtractorSpec`
union over `JointAxisExtractor`, `JointPairDistanceExtractor`,
`JointSpeedExtractor`, and `JointAngleExtractor`, and
`Segmentation` pairing a config with its segments so on-disk
results are self-describing. Load and save helpers with an atomic
tmp-file-then-rename pattern for every state file.
`load_benchmark_result` / `save_benchmark_result` follow the same
atomic pattern. `load_status` is deliberately crash-resilient:
missing, corrupt, or non-mapping JSON returns an empty
`StatusFile` rather than raising. Legacy predictions files
without the `segmentations` field deserialize cleanly to an
empty mapping.
- **`neuropose.estimator`** — `Estimator` class that streams frames
directly from OpenCV into the model, with no intermediate write-to-
disk-then-read-back-as-PNG round trip. Returns a typed
`ProcessVideoResult` containing a validated `VideoPredictions`
object; does not touch the filesystem. Constructor accepts an
injected model for testability; `load_model()` delegates to
`neuropose._model.load_metrabs_model()`. Typed exception hierarchy:
`EstimatorError`, `ModelNotLoadedError`, `VideoDecodeError`.
Optional per-frame `progress` callback for long videos. Frame
identifier convention is `frame_000000` (six-digit zero-pad, no
extension — no file is implied).
object and an always-populated `PerformanceMetrics` bundle (per-
frame latency in ms, total wall clock, peak RSS via `psutil`,
active TF device string, `tensorflow-metal` detection, TF
version, and model load time when the caller went through
`load_model()`). Does not touch the filesystem. Constructor
accepts an injected model for testability; `load_model()`
delegates to `neuropose._model.load_metrabs_model()`. Typed
exception hierarchy: `EstimatorError`, `ModelNotLoadedError`,
`VideoDecodeError`. Optional per-frame `progress` callback for
long videos. Frame identifier convention is `frame_000000`
(six-digit zero-pad, no extension — no file is implied).
- **`neuropose.visualize`** — `visualize_predictions()` for per-frame
2D + 3D overlay rendering. `matplotlib.use("Agg")` is called inside
the function rather than at module import, so `import neuropose.visualize`
@ -127,6 +154,21 @@ be split into per-release sections once tagging begins.
download was truncated). Post-load interface check for
`detect_poses`, `per_skeleton_joint_names`, and
`per_skeleton_joint_edges`.
- **`neuropose.benchmark`** — multi-pass inference benchmarking for
a single video. `run_benchmark()` runs `process_video` N times
(default 5), always discards the first pass as warmup (graph
compilation, file-system cache warmup), and aggregates the
remaining `PerformanceMetrics` into a `BenchmarkAggregate` with
mean / p50 / p95 / p99 per-frame latency, mean throughput, and
max peak RSS. `capture_reference=True` additionally preserves the
last measured pass's `VideoPredictions` in memory so the
`--compare-cpu` CLI flow can diff the `poses3d` arrays between a
GPU and CPU run. `compute_poses3d_divergence()` computes the
maximum element-wise absolute difference (in millimetres) between
two prediction sets, skipping frames with mismatched detection
counts and surfacing the `frame_count_compared` so callers can
tell if the number is trustworthy. `format_benchmark_report()`
renders a human-readable summary for CLI stdout.
- **`neuropose.analyzer`** — post-processing subpackage with lazy
imports for the heavy dependencies:
- `analyzer.dtw` — three DTW entry points (`dtw_all`,
@ -139,16 +181,53 @@ be split into per-release sections once tagging begins.
degenerate vectors), `extract_feature_statistics`
(`FeatureStatistics` frozen dataclass), and a `find_peaks` thin
wrapper around `scipy.signal.find_peaks`.
- `analyzer.segment` — repetition segmentation for trials in
which a subject performs the same movement several times. A
three-layer API: `segment_by_peaks` (pure 1D
valley-to-valley peak detection on a generic signal),
`segment_predictions` (top-level entry point taking a
`VideoPredictions` plus an `ExtractorSpec`, converting
time-based parameters to frame counts via `metadata.fps`), and
`slice_predictions` (split a `VideoPredictions` into one per
detected repetition with re-keyed frame names and a rewritten
`frame_count`). Ships four extractor factories —
`joint_axis`, `joint_pair_distance`, `joint_speed`, and
`joint_angle` — plus a `JOINT_NAMES` constant for the
berkeley_mhad_43 skeleton with a `joint_index(name)` lookup,
so post-processing callers can resolve `"rwri"` → integer
without loading the MeTRAbs SavedModel. A matching integration
test (`tests/integration/test_joint_names_drift.py`, marked
`slow`) loads the real model and asserts the constant still
matches, so any upstream skeleton drift fails CI.
- **`neuropose.cli`** — Typer-based command-line interface with
three subcommands: `watch` (run the daemon), `process <video>`
(run the estimator on a single video), and `analyze <results>`
(stub). Global options `--config/-c`, `--verbose/-v`, `--quiet/-q`,
five subcommands: `watch` (run the daemon), `process <video>`
(run the estimator on a single video), `segment <results>`
(post-hoc repetition segmentation — loads a JobResults or a
single VideoPredictions, runs
`neuropose.analyzer.segment.segment_predictions` with the chosen
extractor and thresholds, and atomically writes the file back
with the new segmentation attached under `--name`),
`benchmark <video>` (multi-pass inference benchmark — runs
`--repeats N` passes with a discarded first pass and
`--warmup-frames M` excluded from the head of each measured
pass, reports aggregates to stdout, and optionally writes a
structured `BenchmarkResult` to `--output`. Supports
`--compare-cpu` which spawns a `--force-cpu` subprocess, diffs
the resulting `poses3d` arrays, and reports throughput speedup
and max divergence in mm — the missing Apple Silicon numerical
verification answer from `RESEARCH.md`), and
`analyze <results>` (stub). The `segment` subcommand accepts
joint specifiers as either berkeley_mhad_43 names (`lwri`,
`rwri`, …) or integer indices, and refuses to overwrite an
existing segmentation of the same name without `--force`.
Global options `--config/-c`, `--verbose/-v`, `--quiet/-q`,
`--version`. Structured error handling turns expected exceptions
(`FileNotFoundError` on config, `ValidationError`, `AlreadyRunningError`,
`NotImplementedError`, `KeyboardInterrupt`) into clear stderr
messages and distinct exit codes (`EXIT_OK=0`, `EXIT_USAGE=2`,
`EXIT_PENDING=3`, `EXIT_INTERRUPTED=130`). The CLI entry point is
wired in `[project.scripts]` as `neuropose = "neuropose.cli:run"`.
(`FileNotFoundError` on config, `ValidationError`,
`AlreadyRunningError`, `NotImplementedError`,
`KeyboardInterrupt`) into clear stderr messages and distinct
exit codes (`EXIT_OK=0`, `EXIT_USAGE=2`, `EXIT_PENDING=3`,
`EXIT_INTERRUPTED=130`). The CLI entry point is wired in
`[project.scripts]` as `neuropose = "neuropose.cli:run"`.
#### Documentation

3
docs/api/benchmark.md Normal file
View File

@ -0,0 +1,3 @@
# `neuropose.benchmark`
::: neuropose.benchmark

3
docs/api/segment.md Normal file
View File

@ -0,0 +1,3 @@
# `neuropose.analyzer.segment`
::: neuropose.analyzer.segment

View File

@ -30,7 +30,8 @@ them are defined by validated pydantic schemas in
### estimator
**Role:** pure inference library. Given a video path and a MeTRAbs
model, produces a validated `VideoPredictions` object.
model, produces a validated `VideoPredictions` object plus a
`PerformanceMetrics` bundle.
**Does NOT handle:** job directories, status files, polling, locking,
signal handling, visualization, or anywhere-to-save decisions. It is a
@ -39,11 +40,34 @@ library, not a daemon.
The estimator streams frames directly from OpenCV into the model — no
intermediate write-to-disk-then-read-back-as-PNG round trip like the
previous prototype had. `process_video()` returns a typed
`ProcessVideoResult` containing the predictions and does not touch the
filesystem unless the caller explicitly asks it to save the result.
`ProcessVideoResult` containing the predictions and an
always-populated `PerformanceMetrics` (per-frame latency, peak RSS,
total wall clock, active TF device, TF version, `tensorflow-metal`
detection, and model load time when the caller went through
`load_model()`). It does not touch the filesystem unless the caller
explicitly asks it to save the result.
See [`neuropose.estimator`](api/estimator.md) for the API reference.
### benchmark
**Role:** multi-pass inference benchmarking layered on top of the
estimator. `run_benchmark()` calls `process_video` N times, discards
the first pass as warmup, and aggregates the remaining
`PerformanceMetrics` into a `BenchmarkAggregate` with distributional
statistics (mean / p50 / p95 / p99 per-frame latency, mean
throughput, max peak RSS).
The benchmark is exposed via the `neuropose benchmark <video>` CLI
subcommand. Its `--compare-cpu` flag spawns a subprocess with GPU
visibility hidden (via `tf.config.set_visible_devices([], "GPU")`
before any TF op) so a Metal-backed Apple Silicon run can be diffed
against a CPU baseline — both the throughput speedup and the maximum
element-wise `poses3d` divergence in millimetres are surfaced in the
output. This is the "is `tensorflow-metal` producing correct
numerics?" check that `RESEARCH.md`'s TensorFlow-version-compatibility
section leaves open for v0.1.
### interfacer
**Role:** job-lifecycle daemon. Watches `input_dir` for new job
@ -74,16 +98,37 @@ Key guarantees:
See [`neuropose.interfacer`](api/interfacer.md) for the API reference.
### analyzer (pending commit 10)
### analyzer
**Role:** post-processing. Takes a `results.json` and produces analysis
output (DTW comparisons, joint-angle features, classification). Each
piece is a pure function of the predictions, so the module is a set of
testable utilities rather than a daemon.
output (DTW comparisons, joint-angle features, repetition segmentation,
classification). Each piece is a pure function of the predictions, so
the module is a set of testable utilities rather than a daemon.
Pending the commit-10 rewrite. The previous prototype's `analyzer.py`
was non-functional (it had imports that could not resolve and
infinite-recursion bugs) and is not being ported forward.
Three submodules ship today:
- `analyzer.features``predictions_to_numpy`, normalization,
padding, joint angles, summary statistics, and a thin
`scipy.signal.find_peaks` wrapper.
- `analyzer.dtw` — three DTW entry points (`dtw_all`, `dtw_per_joint`,
`dtw_relation`) over `fastdtw`, with a frozen `DTWResult` dataclass.
See `RESEARCH.md` for the ongoing methodology discussion.
- `analyzer.segment`**repetition segmentation**. Given a
`VideoPredictions` of a trial in which the subject performs the
same movement several times (e.g. lifting a cup repeatedly), the
module detects the individual repetitions as
`[start, peak, end)` windows via valley-to-valley peak detection
on a clinically chosen 1D signal. The signal is one of four
extractor variants (`joint_axis`, `joint_pair_distance`,
`joint_speed`, `joint_angle`), and the produced `Segmentation`
carries its own `SegmentationConfig` so the on-disk
representation is self-describing. Segmentation is exposed both
as a Python API and as the `neuropose segment` CLI subcommand,
which runs post-hoc against an existing `results.json` — the
daemon stays a pure inference daemon.
Classification wrappers on top of `sktime` are deliberately not
shipped yet; see `RESEARCH.md` for the plan.
## Data flow

View File

@ -94,6 +94,8 @@ nav:
- neuropose.estimator: api/estimator.md
- neuropose.interfacer: api/interfacer.md
- neuropose.io: api/io.md
- neuropose.benchmark: api/benchmark.md
- neuropose.analyzer.segment: api/segment.md
- neuropose.visualize: api/visualize.md
- Development: development.md
- Deployment: deployment.md

View File

@ -55,6 +55,12 @@ dependencies = [
"numpy>=1.26",
"opencv-python-headless>=4.9",
"matplotlib>=3.8",
# psutil powers the peak-RSS and process-memory readings captured by
# PerformanceMetrics in neuropose.io. It is a cross-platform pure-Python
# wrapper around /proc (Linux), mach (macOS), and Win32 APIs with a
# small C extension; brought in at runtime because metrics are always
# collected, not an optional feature.
"psutil>=5.9",
"tensorflow>=2.16,<3.0",
"tensorflow-hub>=0.16",
]

View File

@ -42,8 +42,23 @@ from neuropose.analyzer.features import (
pad_sequences,
predictions_to_numpy,
)
from neuropose.analyzer.segment import (
JOINT_INDEX,
JOINT_NAMES,
extract_signal,
joint_angle,
joint_axis,
joint_index,
joint_pair_distance,
joint_speed,
segment_by_peaks,
segment_predictions,
slice_predictions,
)
__all__ = [
"JOINT_INDEX",
"JOINT_NAMES",
"DTWResult",
"FeatureStatistics",
"dtw_all",
@ -51,8 +66,17 @@ __all__ = [
"dtw_relation",
"extract_feature_statistics",
"extract_joint_angles",
"extract_signal",
"find_peaks",
"joint_angle",
"joint_axis",
"joint_index",
"joint_pair_distance",
"joint_speed",
"normalize_pose_sequence",
"pad_sequences",
"predictions_to_numpy",
"segment_by_peaks",
"segment_predictions",
"slice_predictions",
]

View File

@ -0,0 +1,590 @@
"""Repetition segmentation for pose sequences.
Given a :class:`~neuropose.io.VideoPredictions` of a trial in which the
subject performs the same movement several times (e.g. lifting a cup
repeatedly), this module detects the individual repetitions and returns
them as a :class:`~neuropose.io.Segmentation` that can be persisted back
into ``results.json``.
The detection is a two-step pipeline:
1. **Extract a 1D segmentation signal** from the per-frame pose array,
using one of the :class:`~neuropose.io.ExtractorSpec` variants
(:class:`~neuropose.io.JointAxisExtractor`,
:class:`~neuropose.io.JointPairDistanceExtractor`,
:class:`~neuropose.io.JointSpeedExtractor`, or
:class:`~neuropose.io.JointAngleExtractor`).
2. **Walk peaks to their neighbouring valleys** to form one ``[start,
peak, end)`` window per repetition. The "valley-to-valley" method
assumes the subject comes to rest between repetitions the recording
protocol for NeuroPose's clinical use cases makes this assumption
reasonable; see :mod:`neuropose.io.SegmentationConfig.method` for the
version stamp that travels with persisted segmentations.
Three layers of API are provided, in increasing order of convenience:
- :func:`segment_by_peaks` pure 1D signal segmentation, no pose or
metadata awareness. All parameters are in sample counts.
- :func:`segment_predictions` the top-level entry point. Takes a
:class:`~neuropose.io.VideoPredictions` plus an
:class:`~neuropose.io.ExtractorSpec`, converts time-based parameters
to frame counts using ``metadata.fps``, and returns a full
:class:`~neuropose.io.Segmentation` ready to attach to the predictions.
- :func:`slice_predictions` split a :class:`~neuropose.io.VideoPredictions`
into one per-repetition :class:`~neuropose.io.VideoPredictions`,
useful when downstream code wants per-rep objects rather than windows
over the original.
The four :class:`~neuropose.io.ExtractorSpec` variants have convenience
factories (:func:`joint_axis`, :func:`joint_pair_distance`,
:func:`joint_speed`, :func:`joint_angle`) re-exported at the subpackage
root so that Python callers never have to import the pydantic classes
directly::
from neuropose.analyzer import (
segment_predictions,
joint_pair_distance,
JOINT_INDEX,
)
seg = segment_predictions(
predictions,
joint_pair_distance(JOINT_INDEX["lwri"], JOINT_INDEX["rwri"]),
min_distance_seconds=0.5,
min_prominence=50.0,
)
Dependency note
---------------
This module requires :mod:`scipy` for peak detection, which is part of
the ``analysis`` optional extra. The import is lazy so that
``import neuropose.analyzer.segment`` succeeds even when scipy is not
installed; a clear :class:`ImportError` surfaces at the first call to
:func:`segment_by_peaks` or :func:`segment_predictions`.
"""
from __future__ import annotations
from collections.abc import Sequence
import numpy as np
from neuropose.analyzer.features import predictions_to_numpy
from neuropose.io import (
ExtractorSpec,
JointAngleExtractor,
JointAxisExtractor,
JointPairDistanceExtractor,
JointSpeedExtractor,
Segment,
Segmentation,
SegmentationConfig,
VideoMetadata,
VideoPredictions,
)
# ---------------------------------------------------------------------------
# berkeley_mhad_43 joint names
# ---------------------------------------------------------------------------
#
# The MeTRAbs SavedModel we pin in ``neuropose._model`` exposes a 43-joint
# skeleton named ``berkeley_mhad_43``. The names below are captured
# verbatim from that model and committed as a constant so that
# post-processing code (CLI flags, analyzer helpers) can translate
# human-readable joint names into indices without having to load a
# multi-gigabyte TensorFlow model just to resolve ``"rwri"``.
#
# An integration test (:mod:`tests.integration.test_joint_names_match_model`)
# asserts that this tuple still matches what the loaded model reports, so
# any upstream change in the MeTRAbs skeleton will fail CI the next time
# the slow tests run. When that happens, the expected fix is:
#
# 1. Update this tuple in the same commit that bumps the model pin in
# ``neuropose._model``, and
# 2. Cross-check any CLI or docs that embed hardcoded joint names.
JOINT_NAMES: tuple[str, ...] = (
"head",
"lhead",
"rhead",
"rback",
"backl",
"backt",
"lback",
"lside",
"bell",
"chest",
"rside",
"lsho1",
"lsho2",
"larm",
"lelb",
"lwri",
"lhan1",
"lhan2",
"lhan3",
"rsho1",
"rsho2",
"rarm",
"relb",
"rwri",
"rhan1",
"rhan2",
"rhan3",
"lhipb",
"lhipf",
"lhipl",
"lleg",
"lkne",
"lank",
"lhee",
"lfoo",
"rhipb",
"rhipf",
"rhipl",
"rleg",
"rkne",
"rank",
"rhee",
"rfoo",
)
JOINT_INDEX: dict[str, int] = {name: idx for idx, name in enumerate(JOINT_NAMES)}
def joint_index(name: str) -> int:
"""Return the integer index of ``name`` in the berkeley_mhad_43 skeleton.
Parameters
----------
name
Joint name as it appears in the MeTRAbs SavedModel
``per_skeleton_joint_names["berkeley_mhad_43"]`` tensor for
example ``"rwri"`` for the right wrist or ``"lkne"`` for the
left knee.
Returns
-------
int
The 0-based index into a ``(frames, 43, 3)`` pose sequence.
Raises
------
KeyError
If ``name`` is not one of the 43 known joint names. The error
message lists all valid names to make recovery obvious.
"""
try:
return JOINT_INDEX[name]
except KeyError:
raise KeyError(
f"unknown joint name: {name!r}. Known names: {sorted(JOINT_INDEX)}"
) from None
# ---------------------------------------------------------------------------
# Extractor factories
# ---------------------------------------------------------------------------
#
# Thin constructors for the four ExtractorSpec variants. Callers *could*
# instantiate the pydantic models directly, but the factories read a bit
# better at call sites and match the ergonomic pattern set by the rest
# of the analyzer subpackage (e.g. ``joint_pair_distance(j1, j2)`` reads
# like a function call rather than a schema construction).
def joint_axis(joint: int, axis: int, *, invert: bool = False) -> JointAxisExtractor:
"""Construct a :class:`~neuropose.io.JointAxisExtractor`.
Parameters
----------
joint
Joint index into the ``(frames, J, 3)`` pose sequence.
axis
Spatial axis: ``0`` for x, ``1`` for y, ``2`` for z.
invert
If ``True``, negate the signal so valleys become peaks. Useful
when the movement of interest is a *decrease* in the selected
coordinate (e.g. a joint that dips during the repetition).
"""
return JointAxisExtractor(joint=joint, axis=axis, invert=invert)
def joint_pair_distance(j1: int, j2: int) -> JointPairDistanceExtractor:
"""Construct a :class:`~neuropose.io.JointPairDistanceExtractor`.
The two joints must be distinct; pydantic validation enforces this.
"""
return JointPairDistanceExtractor(joints=(j1, j2))
def joint_speed(joint: int) -> JointSpeedExtractor:
"""Construct a :class:`~neuropose.io.JointSpeedExtractor`."""
return JointSpeedExtractor(joint=joint)
def joint_angle(a: int, b: int, c: int) -> JointAngleExtractor:
"""Construct a :class:`~neuropose.io.JointAngleExtractor`.
The angle is computed at joint ``b`` between the vectors ``(a - b)``
and ``(c - b)``.
"""
return JointAngleExtractor(triplet=(a, b, c))
# ---------------------------------------------------------------------------
# Signal extraction
# ---------------------------------------------------------------------------
def extract_signal(sequence: np.ndarray, spec: ExtractorSpec) -> np.ndarray:
"""Reduce a pose sequence to a 1D segmentation signal per the ``spec``.
Dispatches on the discriminator ``kind`` of ``spec``. All variants
produce a 1D array whose length equals ``sequence.shape[0]``, so the
segmentation engine can treat the output uniformly regardless of
which extractor produced it.
Parameters
----------
sequence
Array of shape ``(frames, joints, 3)`` the output of
:func:`neuropose.analyzer.features.predictions_to_numpy`.
spec
One of the :class:`~neuropose.io.ExtractorSpec` variants.
Returns
-------
numpy.ndarray
A 1D ``float`` array of length ``frames``.
Raises
------
ValueError
If ``sequence`` is not a ``(frames, joints, 3)`` array, if any
joint index in ``spec`` is out of range, or if
``JointSpeedExtractor`` is applied to a single-frame sequence.
"""
if sequence.ndim != 3 or sequence.shape[-1] != 3:
raise ValueError(f"expected (frames, joints, 3); got shape {sequence.shape}")
num_frames, num_joints, _ = sequence.shape
def _check_joint(idx: int) -> None:
if not (0 <= idx < num_joints):
raise ValueError(f"joint index {idx} out of range [0, {num_joints})")
if isinstance(spec, JointAxisExtractor):
_check_joint(spec.joint)
signal = sequence[:, spec.joint, spec.axis].astype(float, copy=True)
if spec.invert:
signal = -signal
return signal
if isinstance(spec, JointPairDistanceExtractor):
j1, j2 = spec.joints
_check_joint(j1)
_check_joint(j2)
delta = sequence[:, j1, :] - sequence[:, j2, :]
return np.linalg.norm(delta, axis=1).astype(float, copy=False)
if isinstance(spec, JointSpeedExtractor):
_check_joint(spec.joint)
if num_frames < 2:
raise ValueError("joint_speed requires at least two frames")
diffs = np.diff(sequence[:, spec.joint, :], axis=0)
mags = np.linalg.norm(diffs, axis=1)
# Pad the first frame with 0 so the signal length matches the
# input frame count; segmentation indices then line up with the
# original frame indices without an off-by-one.
return np.concatenate([[0.0], mags]).astype(float, copy=False)
if isinstance(spec, JointAngleExtractor):
# Deferred import to avoid a circular dependency — features.py
# and segment.py both live under analyzer/, and extract_joint_angles
# is defined in features.py.
from neuropose.analyzer.features import extract_joint_angles
a, b, c = spec.triplet
_check_joint(a)
_check_joint(b)
_check_joint(c)
return extract_joint_angles(sequence, [(a, b, c)])[:, 0].astype(float, copy=False)
raise TypeError(f"unknown extractor kind: {type(spec).__name__}")
# ---------------------------------------------------------------------------
# Layer 1: pure 1D segmentation
# ---------------------------------------------------------------------------
def segment_by_peaks(
signal: np.ndarray,
*,
min_distance: int | None = None,
min_prominence: float | None = None,
min_height: float | None = None,
pad: int = 0,
) -> list[Segment]:
"""Segment a 1D signal into one window per detected peak.
Implements the ``valley_to_valley_v1`` method: each peak found by
:func:`scipy.signal.find_peaks` is walked outward to the nearest
valley on each side (a local minimum of the signal), and the
resulting ``[start, end)`` window is reported as one
:class:`~neuropose.io.Segment`. If the signal has no valley before
the first peak the segment starts at frame 0; likewise a trailing
peak without a following valley extends to the end of the signal.
Parameters
----------
signal
1D numpy array of segmentation feature values (e.g. wrist
height across frames).
min_distance, min_prominence, min_height
Forwarded as ``distance``, ``prominence`` and ``height`` to
:func:`scipy.signal.find_peaks`. Use these to reject noise; the
defaults are permissive.
pad
Number of samples to extend each resulting segment on both
sides. Segments are clamped to ``[0, len(signal)]``; adjacent
padded segments may therefore overlap.
Returns
-------
list[Segment]
One :class:`~neuropose.io.Segment` per detected repetition, in
ascending order of peak index. Empty if no peaks were found.
Raises
------
ValueError
If ``signal`` is not 1D or if ``pad`` is negative.
ImportError
If :mod:`scipy` is not installed. The error message points at
the ``analysis`` optional extra.
"""
if signal.ndim != 1:
raise ValueError(f"expected 1D array; got shape {signal.shape}")
if pad < 0:
raise ValueError(f"pad must be non-negative; got {pad}")
try:
from scipy.signal import find_peaks as _sp_find_peaks
except ImportError as exc:
raise ImportError(
"neuropose.analyzer.segment.segment_by_peaks requires scipy. "
"Install it with: pip install neuropose[analysis]"
) from exc
peak_kwargs: dict[str, float | int] = {}
# scipy requires ``distance >= 1``; treat ``None`` or 0 as "no constraint".
if min_distance is not None and min_distance >= 1:
peak_kwargs["distance"] = min_distance
if min_prominence is not None:
peak_kwargs["prominence"] = min_prominence
if min_height is not None:
peak_kwargs["height"] = min_height
peaks, _ = _sp_find_peaks(signal, **peak_kwargs)
if len(peaks) == 0:
return []
# Valleys = peaks of the negated signal. No extra filters — we want
# every local minimum as a candidate boundary, and we'll pick the
# nearest one on each side of each qualifying peak.
valleys, _ = _sp_find_peaks(-signal)
n = int(signal.shape[0])
segments: list[Segment] = []
for peak in peaks:
peak_idx = int(peak)
left = valleys[valleys < peak_idx]
right = valleys[valleys > peak_idx]
start = int(left.max()) if left.size > 0 else 0
# ``end`` is exclusive; include the trailing valley frame so
# the segment captures the return-to-rest the clinician cares
# about.
end = int(right.min()) + 1 if right.size > 0 else n
start = max(0, start - pad)
end = min(n, end + pad)
segments.append(Segment(start=start, end=end, peak=peak_idx))
return segments
# ---------------------------------------------------------------------------
# Layer 2: pose-aware convenience
# ---------------------------------------------------------------------------
def segment_predictions(
predictions: VideoPredictions,
extractor: ExtractorSpec,
*,
person_index: int = 0,
min_distance_seconds: float | None = None,
min_prominence: float | None = None,
min_height: float | None = None,
pad_seconds: float = 0.0,
) -> Segmentation:
"""Segment a :class:`~neuropose.io.VideoPredictions` into repetitions.
This is the top-level entry point for post-hoc segmentation. It:
1. Reduces ``predictions`` to a ``(frames, joints, 3)`` numpy array
via :func:`~neuropose.analyzer.features.predictions_to_numpy`.
2. Extracts a 1D segmentation signal using ``extractor``.
3. Converts the time-based parameters (``min_distance_seconds``,
``pad_seconds``) to frame counts using
``predictions.metadata.fps``.
4. Delegates to :func:`segment_by_peaks`.
5. Wraps the result in a :class:`~neuropose.io.Segmentation` whose
:class:`~neuropose.io.SegmentationConfig` carries the original
(time-based) parameters and the extractor spec, so the output
is self-describing when persisted.
Parameters
----------
predictions
Per-video predictions to segment. The ``metadata.fps`` field
is used to convert seconds to frames; callers with a video of
unknown frame rate should fall back to :func:`segment_by_peaks`
directly.
extractor
Serializable extractor spec typically produced by one of the
convenience factories (:func:`joint_axis`,
:func:`joint_pair_distance`, :func:`joint_speed`,
:func:`joint_angle`).
person_index
Which detected person to extract from each frame. Defaults to
0 (the first detected person), matching the single-subject
clinical case.
min_distance_seconds, min_prominence, min_height, pad_seconds
Forwarded to :func:`segment_by_peaks` after the time-based
parameters are converted to sample counts via ``metadata.fps``.
Returns
-------
Segmentation
A :class:`~neuropose.io.Segmentation` pairing the segments with
the exact :class:`~neuropose.io.SegmentationConfig` that
produced them. Ready to attach under a name to
``VideoPredictions.segmentations``.
Raises
------
ValueError
If ``predictions`` has zero frames, if a joint index in the
extractor is out of range, or if ``predictions.metadata.fps``
is non-positive and time-based parameters were supplied.
ImportError
If :mod:`scipy` is not installed.
"""
sequence = predictions_to_numpy(predictions, person_index=person_index)
signal = extract_signal(sequence, extractor)
fps = predictions.metadata.fps
needs_fps = (
min_distance_seconds is not None and min_distance_seconds > 0.0
) or pad_seconds > 0.0
if needs_fps and fps <= 0.0:
raise ValueError(
"cannot convert seconds to frames: metadata.fps is "
f"{fps}. Pass min_distance_seconds=None and pad_seconds=0.0, "
"or call segment_by_peaks directly with sample-count units."
)
if min_distance_seconds is None or min_distance_seconds <= 0.0:
min_distance: int | None = None
else:
min_distance = max(1, round(min_distance_seconds * fps))
pad = round(pad_seconds * fps) if pad_seconds > 0.0 else 0
segments = segment_by_peaks(
signal,
min_distance=min_distance,
min_prominence=min_prominence,
min_height=min_height,
pad=pad,
)
config = SegmentationConfig(
extractor=extractor,
person_index=person_index,
min_distance_seconds=min_distance_seconds,
min_prominence=min_prominence,
min_height=min_height,
pad_seconds=pad_seconds,
)
return Segmentation(config=config, segments=segments)
# ---------------------------------------------------------------------------
# Slicing: one VideoPredictions per segment
# ---------------------------------------------------------------------------
def slice_predictions(
predictions: VideoPredictions,
segments: Sequence[Segment],
) -> list[VideoPredictions]:
"""Split a :class:`~neuropose.io.VideoPredictions` into one per segment.
Each output :class:`~neuropose.io.VideoPredictions` contains only
the frames in ``[segment.start, segment.end)`` of the input,
re-keyed starting at ``frame_000000`` and with
:class:`~neuropose.io.VideoMetadata.frame_count` rewritten to match.
Other metadata fields (``fps``, ``width``, ``height``) are copied
verbatim. The input is not modified.
The per-slice ``segmentations`` field is intentionally reset to
empty; a segment is meaningful only against its parent video's
timeline, and copying the parent's segmentations into a sliced view
would be confusing rather than useful.
Parameters
----------
predictions
Source predictions.
segments
Segments to slice out. Indices are interpreted into the input's
``frame_names()`` ordering.
Returns
-------
list[VideoPredictions]
One :class:`~neuropose.io.VideoPredictions` per segment, in the
order of the input list.
Raises
------
ValueError
If any segment's ``end`` exceeds the input frame count.
"""
frame_names = predictions.frame_names()
total_frames = len(frame_names)
results: list[VideoPredictions] = []
for idx, seg in enumerate(segments):
if seg.end > total_frames:
raise ValueError(f"segment {idx} end={seg.end} exceeds frame count {total_frames}")
sliced_frames = {
f"frame_{i:06d}": predictions[src_name]
for i, src_name in enumerate(frame_names[seg.start : seg.end])
}
new_metadata = VideoMetadata(
frame_count=seg.end - seg.start,
fps=predictions.metadata.fps,
width=predictions.metadata.width,
height=predictions.metadata.height,
)
results.append(
VideoPredictions(
metadata=new_metadata,
frames=sliced_frames,
segmentations={},
)
)
return results

320
src/neuropose/benchmark.py Normal file
View File

@ -0,0 +1,320 @@
"""Multi-pass inference benchmarking for :mod:`neuropose.estimator`.
The :mod:`neuropose.estimator` module already instruments every call
and attaches a :class:`~neuropose.io.PerformanceMetrics` to its
:class:`~neuropose.estimator.ProcessVideoResult`, so real-world runs
always carry their own timing. This module layers on top of that to
answer the harder question *what can this machine actually do
steady-state?* by running the same video multiple times, throwing
out the first pass entirely (graph compilation / file-system cache
warmup), and aggregating the rest into distributional statistics.
The design is intentionally thin: a single :func:`run_benchmark`
entry point, a small divergence helper for the ``--compare-cpu``
flow, and a pretty-printer used by the CLI. The heavy lifting
schemas, aggregation, serialisation lives in :mod:`neuropose.io`.
See :class:`neuropose.io.BenchmarkResult` for the result shape and
:mod:`neuropose.cli.benchmark` for the command-line surface.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import numpy as np
from neuropose.estimator import Estimator
from neuropose.io import (
BenchmarkAggregate,
BenchmarkResult,
PerformanceMetrics,
VideoPredictions,
)
@dataclass(frozen=True)
class BenchmarkRunOutcome:
"""Return type of :func:`run_benchmark`.
Separate from :class:`neuropose.io.BenchmarkResult` because the
reference :class:`~neuropose.io.VideoPredictions` captured from the
last measured pass is **not** part of the serialisable benchmark
record it is kept in memory only to feed the ``--compare-cpu``
divergence computation, and would otherwise bloat every JSON output
with redundant pose data.
"""
result: BenchmarkResult
reference_predictions: VideoPredictions | None
def run_benchmark(
estimator: Estimator,
video_path: Path,
*,
repeats: int = 5,
warmup_frames: int = 3,
capture_reference: bool = False,
) -> BenchmarkRunOutcome:
"""Run ``repeats`` passes of ``process_video`` and aggregate timings.
Parameters
----------
estimator
A fully-initialised :class:`~neuropose.estimator.Estimator`
with its model loaded (or injected). The benchmark calls
``estimator.process_video(video_path)`` in a loop.
video_path
Path to an input video. Must be readable by OpenCV; any
:class:`~neuropose.estimator.VideoDecodeError` from the first
pass propagates out (no point retrying a broken file).
repeats
Total number of passes to run. Must be at least 2 the first
pass is always discarded, and the aggregate needs at least one
measured pass to be meaningful.
warmup_frames
Number of frames to exclude from the head of *each* measured
pass when computing the aggregate. Graph compilation and XLA
kernel JIT usually bite only the first 1-3 frames of a pass,
so the default of 3 is a safe floor for Apple Silicon and
CUDA alike.
capture_reference
When ``True``, the :class:`VideoPredictions` from the last
measured pass is preserved on the outcome. Used by the
``--compare-cpu`` flow to diff poses across devices; callers
that only want timings should leave this ``False``.
Returns
-------
BenchmarkRunOutcome
The serialisable :class:`~neuropose.io.BenchmarkResult` plus
(optionally) the reference predictions.
Raises
------
ValueError
If ``repeats`` is less than 2 (no measured passes after
discarding the warmup pass) or ``warmup_frames`` is negative.
"""
if repeats < 2:
raise ValueError(f"repeats must be >= 2; got {repeats}")
if warmup_frames < 0:
raise ValueError(f"warmup_frames must be >= 0; got {warmup_frames}")
passes: list[PerformanceMetrics] = []
reference_predictions: VideoPredictions | None = None
for i in range(repeats):
result = estimator.process_video(video_path)
passes.append(result.metrics)
# Only the *last* measured pass needs to be captured for
# divergence comparison. Earlier passes would just be
# overwritten, so we avoid holding their frame dicts in memory.
if capture_reference and i == repeats - 1:
reference_predictions = result.predictions
aggregate = _aggregate_passes(passes[1:], warmup_frames=warmup_frames)
benchmark_result = BenchmarkResult(
video_name=video_path.name,
repeats=repeats,
warmup_frames=warmup_frames,
warmup_pass=passes[0],
measured_passes=passes[1:],
aggregate=aggregate,
)
return BenchmarkRunOutcome(
result=benchmark_result,
reference_predictions=reference_predictions,
)
def compute_poses3d_divergence(
reference: VideoPredictions,
other: VideoPredictions,
) -> tuple[float, int]:
"""Return the maximum absolute ``poses3d`` divergence in mm.
Walks both prediction sets frame-by-frame and compares each
``(person, joint, axis)`` entry. Frames are matched by name (the
six-digit ``frame_000000`` identifier), which assumes both runs
processed the same source video and decoded the same number of
frames the benchmark always does, since it invokes
``process_video`` on the same file with the same OpenCV build.
Parameters
----------
reference
Predictions from the primary device pass (typically GPU).
other
Predictions from the secondary device pass (typically CPU via
a ``--force-cpu`` subprocess).
Returns
-------
tuple[float, int]
``(max_divergence_mm, frame_count_compared)``. The integer is
the number of frames that actually contributed to the diff
frames with mismatched detection counts are skipped rather
than raising, so the caller can tell "is the number trustworthy?"
from the count alone.
Raises
------
ValueError
If the two prediction sets report different frame counts (the
upstream benchmark should never produce that, so it is a real
bug when it happens).
"""
ref_names = reference.frame_names()
other_names = other.frame_names()
if len(ref_names) != len(other_names):
raise ValueError(
f"frame count mismatch: reference has {len(ref_names)}, other has {len(other_names)}"
)
max_diff = 0.0
compared = 0
for ref_name, other_name in zip(ref_names, other_names, strict=True):
ref_frame = reference[ref_name]
other_frame = other[other_name]
if len(ref_frame.poses3d) != len(other_frame.poses3d):
# Detection-count mismatches mean the two devices disagreed
# about *how many people* were in the frame, not about joint
# positions. Skip these frames for the divergence metric
# but do not silently mask them — the comparison result
# surfaces frame_count_compared so the caller can tell.
continue
if not ref_frame.poses3d:
# Both zero detections → nothing to compare, nothing to skip.
compared += 1
continue
ref_arr = np.asarray(ref_frame.poses3d, dtype=float)
other_arr = np.asarray(other_frame.poses3d, dtype=float)
if ref_arr.shape != other_arr.shape:
continue
diff = float(np.abs(ref_arr - other_arr).max())
if diff > max_diff:
max_diff = diff
compared += 1
return max_diff, compared
def format_benchmark_report(result: BenchmarkResult) -> str:
"""Return a human-readable report for ``result``.
Renders a multi-line summary suitable for stdout. The CLI uses this
for the operator-facing output; the JSON on disk is the machine-
readable form.
"""
agg = result.aggregate
lines = [
f"Benchmark: {result.video_name}",
f" device: {agg.active_device}"
+ (" (tensorflow-metal)" if agg.tensorflow_metal_active else ""),
f" tf version: {agg.tensorflow_version}",
f" model load: {_format_model_load(result.warmup_pass.model_load_seconds)}",
f" repeats: {result.repeats} ({agg.repeats_measured} measured, 1 discarded)",
f" warmup: {result.warmup_frames} frame(s) excluded per pass",
"",
"Per-frame latency (ms, across measured passes after warmup):",
f" mean: {agg.mean_frame_latency_ms:8.2f}",
f" p50: {agg.p50_frame_latency_ms:8.2f}",
f" p95: {agg.p95_frame_latency_ms:8.2f}",
f" p99: {agg.p99_frame_latency_ms:8.2f}",
f" stddev: {agg.stddev_frame_latency_ms:8.2f}",
"",
f"Throughput: {agg.mean_throughput_fps:.2f} fps (mean across measured passes)",
f"Peak RSS: {agg.peak_rss_mb_max:.0f} MB",
]
if result.cpu_comparison is not None:
cmp = result.cpu_comparison
lines.extend(
[
"",
"CPU comparison:",
f" primary throughput: {cmp.primary_aggregate.mean_throughput_fps:.2f} fps",
f" cpu throughput: {cmp.cpu_aggregate.mean_throughput_fps:.2f} fps",
f" speedup: {cmp.speedup:.2f}x",
f" max poses3d divergence: {cmp.max_poses3d_divergence_mm:.4f} mm "
f"({cmp.frame_count_compared} frames compared)",
]
)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Private helpers
# ---------------------------------------------------------------------------
def _aggregate_passes(
measured: list[PerformanceMetrics],
*,
warmup_frames: int,
) -> BenchmarkAggregate:
"""Compute a :class:`BenchmarkAggregate` from a list of measured passes."""
if not measured:
# Caller guarantees repeats >= 2 so measured is non-empty, but
# validate here anyway: an empty list would make np.percentile
# fail with a confusing error downstream.
raise ValueError("cannot aggregate zero measured passes")
latencies: list[float] = []
throughputs: list[float] = []
peak_rss = 0.0
for p in measured:
# Slice off the warmup head of each pass. If warmup_frames
# exceeds the pass length, the slice is empty and contributes
# nothing, which is the correct behaviour for a pathologically
# short video.
kept = p.per_frame_latencies_ms[warmup_frames:]
latencies.extend(kept)
frame_count = len(p.per_frame_latencies_ms)
if frame_count > 0 and p.total_seconds > 0:
throughputs.append(frame_count / p.total_seconds)
if p.peak_rss_mb > peak_rss:
peak_rss = p.peak_rss_mb
if not latencies:
# Could happen if warmup_frames >= frames_per_pass for every
# measured pass. Aggregate with all-zero timing rather than
# exploding; the CLI surfaces the zero-throughput number
# clearly enough that the operator will notice.
return BenchmarkAggregate(
repeats_measured=len(measured),
warmup_frames_per_pass=warmup_frames,
mean_frame_latency_ms=0.0,
p50_frame_latency_ms=0.0,
p95_frame_latency_ms=0.0,
p99_frame_latency_ms=0.0,
stddev_frame_latency_ms=0.0,
mean_throughput_fps=float(np.mean(throughputs)) if throughputs else 0.0,
peak_rss_mb_max=peak_rss,
active_device=measured[0].active_device,
tensorflow_metal_active=measured[0].tensorflow_metal_active,
tensorflow_version=measured[0].tensorflow_version,
)
latencies_arr = np.asarray(latencies, dtype=float)
return BenchmarkAggregate(
repeats_measured=len(measured),
warmup_frames_per_pass=warmup_frames,
mean_frame_latency_ms=float(latencies_arr.mean()),
p50_frame_latency_ms=float(np.percentile(latencies_arr, 50)),
p95_frame_latency_ms=float(np.percentile(latencies_arr, 95)),
p99_frame_latency_ms=float(np.percentile(latencies_arr, 99)),
stddev_frame_latency_ms=float(latencies_arr.std()),
mean_throughput_fps=float(np.mean(throughputs)) if throughputs else 0.0,
peak_rss_mb_max=peak_rss,
active_device=measured[0].active_device,
tensorflow_metal_active=measured[0].tensorflow_metal_active,
tensorflow_version=measured[0].tensorflow_version,
)
def _format_model_load(seconds: float | None) -> str:
"""Format the ``model_load_seconds`` field for the text report."""
if seconds is None:
return "injected (not loaded by benchmark)"
return f"{seconds:.2f} s"

View File

@ -1,11 +1,20 @@
"""NeuroPose command-line interface.
Three subcommands:
Five subcommands:
- ``neuropose watch`` run the :class:`~neuropose.interfacer.Interfacer`
daemon against the configured input directory.
- ``neuropose process <video>`` run the estimator on a single video and
write the predictions JSON to disk.
- ``neuropose segment <results>`` post-hoc repetition segmentation of
an existing predictions file. Attaches a named
:class:`~neuropose.io.Segmentation` to every video it contains and
writes the file back atomically.
- ``neuropose benchmark <video>`` multi-pass inference benchmark for
a single video, with optional ``--compare-cpu`` for Apple-Silicon
vs CPU numerical-divergence checks. Prints a human report to stdout
and (optionally) writes a structured :class:`~neuropose.io.BenchmarkResult`
JSON to ``--output``.
- ``neuropose analyze <results>`` stubbed placeholder pending the
analyzer rewrite in commit 10.
@ -31,7 +40,9 @@ exception classes we expect from the layers below.
from __future__ import annotations
import json
import logging
from enum import StrEnum
from pathlib import Path
from typing import Annotated
@ -42,7 +53,19 @@ from neuropose import __version__
from neuropose.config import Settings
from neuropose.estimator import Estimator
from neuropose.interfacer import AlreadyRunningError, Interfacer
from neuropose.io import save_video_predictions
from neuropose.io import (
BenchmarkResult,
CpuComparisonResult,
ExtractorSpec,
JobResults,
VideoPredictions,
load_benchmark_result,
load_job_results,
load_video_predictions,
save_benchmark_result,
save_job_results,
save_video_predictions,
)
logger = logging.getLogger(__name__)
@ -233,6 +256,626 @@ def process(
typer.echo(f"wrote {out_path} ({result.frame_count} frames)")
# ---------------------------------------------------------------------------
# segment
# ---------------------------------------------------------------------------
class _ExtractorKind(StrEnum):
"""CLI-facing extractor-kind enum for ``neuropose segment --extractor``."""
JOINT_AXIS = "joint_axis"
JOINT_PAIR_DISTANCE = "joint_pair_distance"
JOINT_SPEED = "joint_speed"
JOINT_ANGLE = "joint_angle"
def _resolve_joint(token: str) -> int:
"""Resolve a joint specifier to an integer index.
Accepts either a string joint name from the berkeley_mhad_43 skeleton
(e.g. ``"rwri"``) or a plain integer as a string. Raises
``typer.BadParameter`` with a helpful message on failure so CLI users
get a short error rather than a pydantic traceback.
"""
# Deferred import so the CLI module itself stays free of the analyzer
# heavy imports when the user is only running watch/process.
from neuropose.analyzer.segment import JOINT_INDEX
stripped = token.strip()
if not stripped:
raise typer.BadParameter("joint specifier is empty")
if stripped in JOINT_INDEX:
return JOINT_INDEX[stripped]
try:
return int(stripped)
except ValueError as exc:
raise typer.BadParameter(
f"unknown joint {stripped!r}; expected an integer index or one of "
f"the berkeley_mhad_43 names (e.g. 'rwri', 'lkne')"
) from exc
def _parse_joint_list(raw: str, *, expected: int) -> tuple[int, ...]:
"""Split a comma-separated joint list into ``expected`` integer indices."""
tokens = [t for t in raw.split(",") if t.strip()]
if len(tokens) != expected:
raise typer.BadParameter(
f"expected {expected} comma-separated joint specifiers; got {len(tokens)}: {raw!r}"
)
return tuple(_resolve_joint(t) for t in tokens)
def _build_extractor_spec(
kind: _ExtractorKind,
*,
joint: str | None,
joints: str | None,
triplet: str | None,
axis: int | None,
invert: bool,
) -> ExtractorSpec:
"""Translate CLI flags into a serializable :class:`ExtractorSpec`.
Each extractor kind consumes a different subset of the flags; unused
flags are ignored for ergonomics (so users can leave defaults in shell
aliases) but missing required flags raise ``typer.BadParameter`` with
the specific missing name so the error is obvious.
"""
from neuropose.analyzer.segment import (
joint_angle,
joint_axis,
joint_pair_distance,
joint_speed,
)
if kind is _ExtractorKind.JOINT_AXIS:
if joint is None:
raise typer.BadParameter("--joint is required for joint_axis")
if axis is None:
raise typer.BadParameter("--axis is required for joint_axis")
if not (0 <= axis <= 2):
raise typer.BadParameter(f"--axis must be 0, 1, or 2; got {axis}")
return joint_axis(_resolve_joint(joint), axis, invert=invert)
if kind is _ExtractorKind.JOINT_PAIR_DISTANCE:
if joints is None:
raise typer.BadParameter(
"--joints is required for joint_pair_distance (e.g. --joints lwri,rwri)"
)
j1, j2 = _parse_joint_list(joints, expected=2)
return joint_pair_distance(j1, j2)
if kind is _ExtractorKind.JOINT_SPEED:
if joint is None:
raise typer.BadParameter("--joint is required for joint_speed")
return joint_speed(_resolve_joint(joint))
if kind is _ExtractorKind.JOINT_ANGLE:
if triplet is None:
raise typer.BadParameter(
"--triplet is required for joint_angle (e.g. --triplet larm,lelb,lwri)"
)
a, b, c = _parse_joint_list(triplet, expected=3)
return joint_angle(a, b, c)
raise typer.BadParameter(f"unknown extractor kind: {kind}")
def _load_predictions_or_results(
path: Path,
) -> tuple[JobResults | VideoPredictions, bool]:
"""Load a results file, auto-detecting JobResults vs VideoPredictions.
Returns the loaded object and a boolean ``is_job_results`` flag so
the caller knows which save helper to use when writing back. The
distinction is made by trying :class:`VideoPredictions` first (the
more restrictive shape) and falling back to :class:`JobResults`.
"""
with path.open("r", encoding="utf-8") as f:
raw = json.load(f)
# A JobResults is a mapping of video-name → VideoPredictions; a bare
# VideoPredictions always has a top-level "metadata" key. This is
# cheap enough to dispatch on without a full validation pass.
if isinstance(raw, dict) and "metadata" in raw and "frames" in raw:
return load_video_predictions(path), False
return load_job_results(path), True
@app.command()
def segment(
ctx: typer.Context,
results: Annotated[
Path,
typer.Argument(
exists=True,
file_okay=True,
dir_okay=False,
readable=True,
help="Path to a results.json (JobResults) or predictions.json "
"(single VideoPredictions).",
),
],
name: Annotated[
str,
typer.Option(
"--name",
help="Key under which the segmentation will be stored in "
"VideoPredictions.segmentations (e.g. 'cup_lift').",
),
],
extractor: Annotated[
_ExtractorKind,
typer.Option(
"--extractor",
help="Which segmentation signal to extract from each frame.",
),
],
joint: Annotated[
str | None,
typer.Option(
"--joint",
help="Joint specifier for joint_axis and joint_speed extractors. "
"Accepts a berkeley_mhad_43 name ('rwri', 'lkne') or an integer index.",
),
] = None,
joints: Annotated[
str | None,
typer.Option(
"--joints",
help="Comma-separated pair for joint_pair_distance (e.g. 'lwri,rwri').",
),
] = None,
triplet: Annotated[
str | None,
typer.Option(
"--triplet",
help="Comma-separated (a,b,c) for joint_angle; angle is at b.",
),
] = None,
axis: Annotated[
int | None,
typer.Option(
"--axis",
help="Spatial axis for joint_axis: 0=x, 1=y, 2=z.",
),
] = None,
invert: Annotated[
bool,
typer.Option(
"--invert",
help="Negate the signal (joint_axis only); useful when the "
"movement of interest is a decrease in the selected coordinate.",
),
] = False,
person_index: Annotated[
int,
typer.Option("--person-index", min=0, help="Which detected person to use."),
] = 0,
min_distance_seconds: Annotated[
float | None,
typer.Option(
"--min-distance-seconds",
min=0.0,
help="Minimum time between successive repetition peaks.",
),
] = None,
min_prominence: Annotated[
float | None,
typer.Option(
"--min-prominence",
help="Minimum scipy peak prominence on the raw signal.",
),
] = None,
min_height: Annotated[
float | None,
typer.Option(
"--min-height",
help="Minimum signal value to qualify as a peak.",
),
] = None,
pad_seconds: Annotated[
float,
typer.Option(
"--pad-seconds",
min=0.0,
help="Amount of time to extend each segment on both sides.",
),
] = 0.0,
output: Annotated[
Path | None,
typer.Option(
"--output",
"-o",
help="Where to write the segmented file. Defaults to overwriting the input atomically.",
dir_okay=False,
writable=True,
),
] = None,
force: Annotated[
bool,
typer.Option(
"--force",
help="Overwrite an existing segmentation with the same --name. "
"Without --force, a collision is a usage error.",
),
] = False,
) -> None:
"""Run post-hoc repetition segmentation over an existing predictions file.
Loads the file (auto-detecting JobResults vs single VideoPredictions),
runs :func:`neuropose.analyzer.segment.segment_predictions` on each
video with the chosen extractor and thresholds, attaches the result
under ``--name`` in every video's ``segmentations`` mapping, and
writes the file back to ``--output`` (or the input path by default).
The command mutates only the ``segmentations`` field; inference
output per-frame poses and video metadata round-trips unchanged.
"""
del ctx # settings not needed; this command is pure post-processing
# Deferred import keeps the CLI module's top-level imports free of
# scipy so watch/process do not pay the import cost on startup.
from neuropose.analyzer.segment import segment_predictions
try:
spec = _build_extractor_spec(
extractor,
joint=joint,
joints=joints,
triplet=triplet,
axis=axis,
invert=invert,
)
except typer.BadParameter as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
try:
loaded, is_job_results = _load_predictions_or_results(results)
except (ValidationError, json.JSONDecodeError) as exc:
typer.echo(f"error: could not load {results}: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
# Normalise to an iterable of (video_name, VideoPredictions) so the
# per-video segmentation loop is the same for both input shapes.
if is_job_results:
assert isinstance(loaded, JobResults)
videos: dict[str, VideoPredictions] = dict(loaded.root)
else:
assert isinstance(loaded, VideoPredictions)
videos = {results.name: loaded}
updated: dict[str, VideoPredictions] = {}
total_segments = 0
for video_name, preds in videos.items():
if name in preds.segmentations and not force:
typer.echo(
f"error: video {video_name!r} already has a segmentation named "
f"{name!r}; pass --force to overwrite.",
err=True,
)
raise typer.Exit(code=EXIT_USAGE)
try:
result = segment_predictions(
preds,
spec,
person_index=person_index,
min_distance_seconds=min_distance_seconds,
min_prominence=min_prominence,
min_height=min_height,
pad_seconds=pad_seconds,
)
except (ValueError, ImportError) as exc:
typer.echo(f"error: segmentation failed for {video_name}: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
new_segmentations = dict(preds.segmentations)
new_segmentations[name] = result
updated[video_name] = preds.model_copy(update={"segmentations": new_segmentations})
total_segments += len(result.segments)
typer.echo(f"{video_name}: {len(result.segments)} segment(s) under name={name!r}")
out_path = output if output is not None else results
if is_job_results:
save_job_results(out_path, JobResults(root=updated))
else:
# Single-VideoPredictions input: the dict has exactly one entry.
(only,) = updated.values()
save_video_predictions(out_path, only)
typer.echo(f"wrote {out_path} ({len(updated)} video(s), {total_segments} total segment(s))")
# ---------------------------------------------------------------------------
# benchmark
# ---------------------------------------------------------------------------
def _force_cpu_only() -> None:
"""Hide all GPU devices from TensorFlow before any TF op initialises.
Called from the ``--force-cpu`` benchmark path before the estimator
loads the model. ``tf.config.set_visible_devices`` must be invoked
before the runtime has touched a GPU device, which in practice
means before any other TF API call so this function imports TF
itself and then immediately pins visibility to an empty GPU list.
The import is deferred inside the function so that non-benchmark
CLI paths (``watch``, ``process``) do not pay the ~2 s TensorFlow
import cost unless they are actually doing inference.
"""
try:
import tensorflow as tf
except ImportError as exc:
raise typer.BadParameter(
"--force-cpu requires TensorFlow to be installed, but import failed"
) from exc
# Silence is fine here: a stale visibility call on a runtime that
# already initialised GPU raises RuntimeError, but the benchmark CLI
# calls this as its first TF op so that should never happen.
tf.config.set_visible_devices([], "GPU")
def _run_compare_cpu_subprocess(
video: Path,
*,
repeats: int,
warmup_frames: int,
) -> tuple[BenchmarkResult, VideoPredictions]:
"""Spawn a ``--force-cpu`` child to get a CPU baseline + predictions.
The parent benchmark runs on whatever device the platform exposes
by default (Apple Silicon Metal GPU when the extra is installed;
Linux CPU by default). For the ``--compare-cpu`` comparison we
need a second run that is *guaranteed* to hit CPU, and we need the
resulting ``poses3d`` to diff against the parent's output.
The cleanest way to guarantee device isolation is to run the CPU
pass in a subprocess with GPU visibility hidden before any TF
import. In-process device switching via
``tf.device("/CPU:0")`` is not reliable against SavedModels whose
ConcreteFunctions may carry baked-in device placements, and
calling ``set_visible_devices`` after the GPU has already been
touched is a hard error from TF's runtime.
The subprocess is invoked as ``neuropose benchmark <video> --force-cpu
--repeats N --warmup-frames M --output <tmp> --predictions-output
<tmp>`` so the child writes both the benchmark result and the last
measured pass's predictions to disk. The parent reads them, deletes
the tempfiles, and returns.
"""
# Deferred imports: subprocess and tempfile are pure-Python and
# cheap, but keeping them inside the function keeps the module's
# top-level import surface small.
import subprocess
import sys
import tempfile
with tempfile.TemporaryDirectory(prefix="neuropose_cpu_bench_") as td:
td_path = Path(td)
result_path = td_path / "result.json"
predictions_path = td_path / "predictions.json"
cmd = [
sys.executable,
"-m",
"neuropose.cli",
"benchmark",
str(video),
"--repeats",
str(repeats),
"--warmup-frames",
str(warmup_frames),
"--output",
str(result_path),
"--predictions-output",
str(predictions_path),
"--force-cpu",
]
logger.info("spawning cpu-baseline subprocess: %s", " ".join(cmd))
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
if proc.returncode != 0:
raise RuntimeError(
f"cpu-baseline subprocess failed (exit {proc.returncode}): {proc.stderr.strip()}"
)
return (
load_benchmark_result(result_path),
load_video_predictions(predictions_path),
)
@app.command()
def benchmark(
ctx: typer.Context,
video: Annotated[
Path,
typer.Argument(
exists=True,
file_okay=True,
dir_okay=False,
readable=True,
help="Path to an input video file to benchmark.",
),
],
repeats: Annotated[
int,
typer.Option(
"--repeats",
min=2,
help="Total passes to run; the first is always discarded as warmup.",
),
] = 5,
warmup_frames: Annotated[
int,
typer.Option(
"--warmup-frames",
min=0,
help="Frames to discard from the head of each measured pass.",
),
] = 3,
compare_cpu: Annotated[
bool,
typer.Option(
"--compare-cpu",
help=(
"After the primary benchmark, spawn a subprocess with "
"--force-cpu to produce a CPU baseline on the same video, "
"then report throughput speedup and max poses3d divergence "
"in millimetres. Intended for Apple Silicon numerics "
"verification."
),
),
] = False,
force_cpu: Annotated[
bool,
typer.Option(
"--force-cpu",
help=(
"Hide all GPU devices from TensorFlow before loading the "
"model, so inference is guaranteed to run on CPU. Used "
"internally by --compare-cpu's subprocess; rarely needed "
"as a user-facing flag."
),
),
] = False,
output: Annotated[
Path | None,
typer.Option(
"--output",
"-o",
help=(
"Where to write the structured BenchmarkResult JSON. "
"When omitted, only the human-readable report prints to "
"stdout."
),
dir_okay=False,
writable=True,
),
] = None,
predictions_output: Annotated[
Path | None,
typer.Option(
"--predictions-output",
help=(
"Where to write the last measured pass's VideoPredictions "
"JSON. Used internally by --compare-cpu to supply the "
"divergence computation with the CPU pass's poses3d; "
"leave unset for normal benchmark runs."
),
dir_okay=False,
writable=True,
),
] = None,
) -> None:
"""Run a multi-pass inference benchmark for a single video.
The first pass is always discarded (graph compilation / filesystem
caches), and subsequent passes contribute to a
:class:`~neuropose.io.BenchmarkAggregate` with mean / p50 / p95 / p99
per-frame latencies, throughput, peak RSS, active device, and the
exact TensorFlow version that ran the measurement. With
``--compare-cpu``, a second run is spawned as a subprocess with
GPU visibility hidden; the resulting CPU baseline is diffed against
the primary run's ``poses3d`` array to produce a maximum absolute
divergence in millimetres, which is the answer to the "is
tensorflow-metal producing correct numerics?" question that
RESEARCH.md's TensorFlow-version-compatibility section leaves open.
"""
settings: Settings = ctx.obj
# Deferred imports: the benchmark module and its dependencies
# (numpy, time) are needed for this subcommand only.
from neuropose.benchmark import (
compute_poses3d_divergence,
format_benchmark_report,
run_benchmark,
)
if force_cpu and compare_cpu:
typer.echo(
"error: --force-cpu and --compare-cpu are mutually exclusive; "
"--force-cpu is an implementation detail of --compare-cpu's "
"subprocess and should not be combined with it.",
err=True,
)
raise typer.Exit(code=EXIT_USAGE)
if force_cpu:
_force_cpu_only()
estimator = Estimator(
device=settings.device,
default_fov_degrees=settings.default_fov_degrees,
)
try:
estimator.load_model(cache_dir=settings.model_cache_dir)
except NotImplementedError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_PENDING) from exc
outcome = run_benchmark(
estimator,
video,
repeats=repeats,
warmup_frames=warmup_frames,
capture_reference=compare_cpu or predictions_output is not None,
)
result = outcome.result
# CPU comparison path: spawn child, diff poses3d, rewrap result
# with a cpu_comparison field populated.
if compare_cpu:
if outcome.reference_predictions is None:
# Should be impossible because we asked for it above; guard
# anyway so a future refactor cannot silently drop it.
typer.echo(
"error: --compare-cpu requires reference predictions, "
"but none were captured. This is a bug.",
err=True,
)
raise typer.Exit(code=EXIT_USAGE)
try:
cpu_result, cpu_predictions = _run_compare_cpu_subprocess(
video,
repeats=repeats,
warmup_frames=warmup_frames,
)
except RuntimeError as exc:
typer.echo(f"error: {exc}", err=True)
raise typer.Exit(code=EXIT_USAGE) from exc
max_diff, compared = compute_poses3d_divergence(
outcome.reference_predictions, cpu_predictions
)
primary_throughput = result.aggregate.mean_throughput_fps
cpu_throughput = cpu_result.aggregate.mean_throughput_fps
speedup = primary_throughput / cpu_throughput if cpu_throughput > 0 else 0.0
comparison = CpuComparisonResult(
primary_aggregate=result.aggregate,
cpu_aggregate=cpu_result.aggregate,
speedup=speedup,
max_poses3d_divergence_mm=max_diff,
frame_count_compared=compared,
)
result = result.model_copy(update={"cpu_comparison": comparison})
typer.echo(format_benchmark_report(result))
if output is not None:
save_benchmark_result(output, result)
typer.echo(f"\nwrote {output}")
if predictions_output is not None:
if outcome.reference_predictions is None:
typer.echo(
"error: --predictions-output was given but no reference "
"predictions were captured. This is a bug.",
err=True,
)
raise typer.Exit(code=EXIT_USAGE)
save_video_predictions(predictions_output, outcome.reference_predictions)
# ---------------------------------------------------------------------------
# analyze (stub)
# ---------------------------------------------------------------------------

View File

@ -34,15 +34,22 @@ model is present raises :class:`ModelNotLoadedError`.
from __future__ import annotations
import logging
import time
from collections.abc import Callable
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
import cv2
import psutil
from neuropose._model import load_metrabs_model
from neuropose.io import FramePrediction, VideoMetadata, VideoPredictions
from neuropose.io import (
FramePrediction,
PerformanceMetrics,
VideoMetadata,
VideoPredictions,
)
logger = logging.getLogger(__name__)
@ -78,9 +85,23 @@ class ProcessVideoResult:
predictions
The validated :class:`VideoPredictions` object, containing both the
per-frame predictions and the ``VideoMetadata`` envelope.
metrics
Timing and resource-usage metrics for the call. Always populated
so real-world runs carry their own measurements without the
caller having to opt in. ``metrics.model_load_seconds`` is
``None`` when the model was injected rather than loaded via
:meth:`Estimator.load_model`.
"""
predictions: VideoPredictions
metrics: PerformanceMetrics = field(
default_factory=lambda: PerformanceMetrics(
total_seconds=0.0,
peak_rss_mb=0.0,
active_device="/CPU:0",
tensorflow_version="unknown",
)
)
@property
def frame_count(self) -> int:
@ -132,6 +153,11 @@ class Estimator:
self.skeleton = skeleton
self.default_fov_degrees = default_fov_degrees
self._model: Any | None = model
# ``None`` when the model was injected via the constructor (tests,
# shared-model callers) — we never fake a zero load time. Set on
# successful ``load_model`` below so the next ``process_video`` can
# pass the real number through into ``PerformanceMetrics``.
self._model_load_seconds: float | None = None
# -- model lifecycle ----------------------------------------------------
@ -169,8 +195,10 @@ class Estimator:
logger.debug("Model already loaded; skipping reload.")
return
logger.info("Loading MeTRAbs model (cache_dir=%s)", cache_dir)
start = time.perf_counter()
self._model = load_metrabs_model(cache_dir=cache_dir)
logger.info("MeTRAbs model loaded.")
self._model_load_seconds = time.perf_counter() - start
logger.info("MeTRAbs model loaded in %.2f s", self._model_load_seconds)
# -- inference ----------------------------------------------------------
@ -223,6 +251,14 @@ class Estimator:
if not cap.isOpened():
raise VideoDecodeError(f"OpenCV could not open video: {video_path}")
# Start metrics collection *after* the file-not-found and
# decode-error paths so total_seconds reflects "work the estimator
# actually did" rather than setup failures the caller handles.
process = psutil.Process()
peak_rss_bytes = process.memory_info().rss
per_frame_latencies_ms: list[float] = []
overall_start = time.perf_counter()
try:
fps = float(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
@ -246,9 +282,19 @@ class Estimator:
break
# MeTRAbs was trained on RGB images; OpenCV gives us BGR.
rgb_frame = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)
# Time only the model call — cv2 decode and colour
# conversion are not what the benchmark is asking about.
frame_start = time.perf_counter()
prediction = self._infer_frame(model, rgb_frame, fov)
per_frame_latencies_ms.append((time.perf_counter() - frame_start) * 1000.0)
frames[f"frame_{frame_index:06d}"] = prediction
frame_index += 1
# Sample RSS once per frame. Cheap (a single proc/mach
# syscall) and produces a monotonic peak without needing
# a background thread.
rss = process.memory_info().rss
if rss > peak_rss_bytes:
peak_rss_bytes = rss
if progress is not None:
progress(frame_index, total_hint)
@ -264,8 +310,28 @@ class Estimator:
if frame_index == 0:
logger.warning("Video %s contained no decodable frames.", video_path)
total_seconds = time.perf_counter() - overall_start
device_info = _detect_active_device()
metrics = PerformanceMetrics(
model_load_seconds=self._model_load_seconds,
total_seconds=total_seconds,
per_frame_latencies_ms=per_frame_latencies_ms,
peak_rss_mb=peak_rss_bytes / (1024.0 * 1024.0),
active_device=device_info.device,
tensorflow_metal_active=device_info.metal_active,
tensorflow_version=device_info.tf_version,
)
logger.info(
"Processed %d frames in %.2f s (%.1f fps, peak RSS %.0f MB, device %s)",
frame_index,
total_seconds,
frame_index / total_seconds if total_seconds > 0 else 0.0,
metrics.peak_rss_mb,
metrics.active_device,
)
predictions = VideoPredictions(metadata=metadata, frames=frames)
return ProcessVideoResult(predictions=predictions)
return ProcessVideoResult(predictions=predictions, metrics=metrics)
# -- internals ----------------------------------------------------------
@ -306,3 +372,62 @@ def _to_nested_list(value: Any) -> Any:
if hasattr(value, "tolist"):
return value.tolist()
return list(value)
@dataclass(frozen=True)
class _ActiveDeviceInfo:
"""Small bundle of device info gathered for ``PerformanceMetrics``."""
device: str
metal_active: bool
tf_version: str
def _detect_active_device() -> _ActiveDeviceInfo:
"""Report which TF device the current process would use for inference.
Returns a synthetic "unknown" bundle if TensorFlow is not importable
unit tests that inject a fake model do not have a real TF install at
the metrics layer, and we do not want to fail the call for that.
On Apple Silicon with the ``[metal]`` extra installed, TensorFlow
exposes a ``GPU`` device contributed by the ``tensorflow-metal``
PluggableDevice. The distinction between that and a real CUDA GPU
matters for :mod:`neuropose.benchmark`'s ``--compare-cpu`` flow, so
we surface it via ``metal_active``.
"""
try:
import tensorflow as tf
except ImportError:
return _ActiveDeviceInfo(
device="unknown",
metal_active=False,
tf_version="unknown",
)
tf_version = getattr(tf, "__version__", "unknown")
try:
gpu_devices = tf.config.list_physical_devices("GPU")
except Exception: # runtime-specific: never hard-fail metrics
gpu_devices = []
device = "/GPU:0" if gpu_devices else "/CPU:0"
metal_active = False
if gpu_devices:
try:
from importlib.metadata import PackageNotFoundError, version
try:
version("tensorflow-metal")
metal_active = True
except PackageNotFoundError:
metal_active = False
except Exception:
metal_active = False
return _ActiveDeviceInfo(
device=device,
metal_active=metal_active,
tf_version=tf_version,
)

View File

@ -19,9 +19,9 @@ from collections.abc import Iterator
from datetime import datetime
from enum import StrEnum
from pathlib import Path
from typing import Any
from typing import Annotated, Any, Literal
from pydantic import BaseModel, ConfigDict, Field, RootModel
from pydantic import BaseModel, ConfigDict, Field, RootModel, model_validator
class JobStatus(StrEnum):
@ -76,18 +76,402 @@ class VideoMetadata(BaseModel):
height: int = Field(ge=0, description="Source video frame height in pixels.")
class PerformanceMetrics(BaseModel):
"""Timing and resource-usage metrics collected during inference.
Populated by :meth:`neuropose.estimator.Estimator.process_video` on
every call so real-world runs always carry their own timing without
the caller having to opt in. The values describe the estimator's
behaviour on one specific pass over a single video and are *not*
aggregates callers interested in distributional statistics
(p50/p95/p99) should pass the same video through
:func:`neuropose.benchmark.run_benchmark`, which runs multiple
passes and aggregates the resulting :class:`PerformanceMetrics`
instances.
Fields intentionally capture "what machine ran this":
``active_device`` reports the TensorFlow device string the model
ran on (``/GPU:0`` on Apple Silicon with ``tensorflow-metal``,
``/CPU:0`` otherwise), and ``tensorflow_version`` captures the
exact TF release so downstream readers can tell whether a
numerical discrepancy between two runs is a stack-version
difference rather than a real measurement.
Frozen so that the metrics captured at return time cannot be
edited in-place downstream.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
model_load_seconds: float | None = Field(
default=None,
ge=0.0,
description=(
"Wall-clock time to load the MeTRAbs model, in seconds. "
"``None`` when the model was injected via ``Estimator(model=...)`` "
"rather than loaded through ``load_model()``, so the value is "
"never confused with a zero-cost cache hit."
),
)
total_seconds: float = Field(
ge=0.0,
description="Wall-clock time of ``process_video`` end-to-end.",
)
per_frame_latencies_ms: list[float] = Field(
default_factory=list,
description=(
"Per-frame inference latencies in milliseconds, one entry per "
"decoded frame in insertion order. Excludes video-decode time "
"and captures only the ``detect_poses`` call, so callers "
"aggregating over this list are measuring model throughput."
),
)
peak_rss_mb: float = Field(
ge=0.0,
description=(
"Maximum resident-set size observed during the call, in "
"megabytes. Sampled once after each frame via ``psutil``."
),
)
active_device: str = Field(
description=(
"TensorFlow device string the inference ran on, e.g. "
"``/CPU:0`` or ``/GPU:0``. Derived from "
"``tf.config.list_physical_devices('GPU')`` — when a GPU "
"device is visible the string is ``/GPU:0``, otherwise "
"``/CPU:0``."
),
)
tensorflow_metal_active: bool = Field(
default=False,
description=(
"``True`` if the ``tensorflow-metal`` PluggableDevice is "
"installed and contributed the visible GPU device, i.e. "
"MeTRAbs inference is running through Apple's Metal "
"Performance Shaders backend on Apple Silicon. ``False`` "
"otherwise, including on Linux CUDA builds."
),
)
tensorflow_version: str = Field(
description="Value of ``tensorflow.__version__`` at the time of the call.",
)
class BenchmarkAggregate(BaseModel):
"""Distributional statistics aggregated across benchmark passes.
Computed by :mod:`neuropose.benchmark` over the measured
:class:`PerformanceMetrics` instances of a multi-pass benchmark run
the very first pass is always discarded as warmup (graph
compilation, file-system caches, etc.), and the first
``warmup_frames_per_pass`` frames of each remaining pass are also
excluded so graph-init cost inside a pass does not contaminate the
steady-state percentiles.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
repeats_measured: int = Field(
ge=0,
description="Number of passes that contributed to the aggregate.",
)
warmup_frames_per_pass: int = Field(
ge=0,
description="Frames discarded from the head of each measured pass.",
)
mean_frame_latency_ms: float = Field(ge=0.0)
p50_frame_latency_ms: float = Field(ge=0.0)
p95_frame_latency_ms: float = Field(ge=0.0)
p99_frame_latency_ms: float = Field(ge=0.0)
stddev_frame_latency_ms: float = Field(ge=0.0)
mean_throughput_fps: float = Field(ge=0.0)
peak_rss_mb_max: float = Field(
ge=0.0,
description="Maximum peak RSS observed across all measured passes.",
)
active_device: str
tensorflow_metal_active: bool = False
tensorflow_version: str
class CpuComparisonResult(BaseModel):
"""Result of a ``--compare-cpu`` benchmark run.
The parent benchmark process runs on the platform's default device
(GPU if visible), and a subprocess is spawned with ``--force-cpu``
to run the exact same video on CPU. Both runs' aggregates are
preserved here alongside the maximum element-wise divergence in the
resulting ``poses3d`` arrays that number is the "is Metal
numerically OK?" answer that ``RESEARCH.md`` §TensorFlow version
compatibility is asking.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
primary_aggregate: BenchmarkAggregate = Field(
description="Aggregate from the parent (default-device) run.",
)
cpu_aggregate: BenchmarkAggregate = Field(
description="Aggregate from the ``--force-cpu`` subprocess run.",
)
speedup: float = Field(
description=(
"``primary_aggregate.mean_throughput_fps / "
"cpu_aggregate.mean_throughput_fps``. Values greater than 1 "
"mean the primary device is faster than CPU; values less "
"than 1 mean CPU won (possible on tiny videos where device "
"initialisation dominates)."
),
)
max_poses3d_divergence_mm: float = Field(
ge=0.0,
description=(
"Maximum element-wise absolute difference (in millimetres) "
"between the primary-device and CPU-device ``poses3d`` "
"arrays, taken over every frame, detection, joint, and "
"axis. A small number (~1e-3 mm) is the expected floor; "
"anything above ~1e-2 mm warrants investigation before "
"trusting the primary device for clinical measurement."
),
)
frame_count_compared: int = Field(
ge=0,
description="Number of frames that entered the divergence computation.",
)
class BenchmarkResult(BaseModel):
"""Top-level result of :func:`neuropose.benchmark.run_benchmark`.
Carries the raw per-pass metrics, the aggregated summary, and (if
``--compare-cpu`` was requested) a :class:`CpuComparisonResult`.
Serialised to JSON for downstream regression tracking; pretty-printed
to stdout by the :mod:`neuropose.cli.benchmark` subcommand.
The ``video_name`` field intentionally stores only the file's
basename, not the full path, to match ``VideoMetadata``'s stance on
not persisting potentially subject-identifying filesystem paths.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
video_name: str = Field(
description="Basename of the benchmarked video (no directory components).",
)
repeats: int = Field(
ge=1,
description="Total passes executed — includes the discarded warmup pass.",
)
warmup_frames: int = Field(
ge=0,
description=(
"Frames excluded from the head of each measured pass when "
"computing the aggregate statistics."
),
)
warmup_pass: PerformanceMetrics = Field(
description=(
"First pass, discarded from the aggregate. Preserved here so "
"readers can see the graph-compilation cost explicitly."
),
)
measured_passes: list[PerformanceMetrics] = Field(
description="Passes 2..N, i.e. those that contributed to the aggregate.",
)
aggregate: BenchmarkAggregate
cpu_comparison: CpuComparisonResult | None = None
class JointAxisExtractor(BaseModel):
"""Segmentation signal extracted from one axis of one joint.
The simplest extractor: picks a single spatial coordinate of a single
joint as the segmentation signal. Use ``invert=True`` when the motion
of interest is a decrease (e.g. a joint that dips down during the
repetition) so that peak detection can work on a maximum rather than
a minimum.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
kind: Literal["joint_axis"] = "joint_axis"
joint: int = Field(ge=0, description="Joint index into the (J,) skeleton.")
axis: int = Field(ge=0, le=2, description="Spatial axis: 0=x, 1=y, 2=z.")
invert: bool = Field(default=False, description="Negate the signal so valleys become peaks.")
class JointPairDistanceExtractor(BaseModel):
"""Signal = Euclidean distance between two joints per frame.
Translation- and (partially) rotation-invariant. Good fit for tasks
where the target motion is a distance change between two body parts
(e.g. wrist-to-shoulder for a reach).
"""
model_config = ConfigDict(extra="forbid", frozen=True)
kind: Literal["joint_pair_distance"] = "joint_pair_distance"
joints: tuple[int, int] = Field(description="Ordered pair of joint indices.")
@model_validator(mode="after")
def _joints_distinct(self) -> JointPairDistanceExtractor:
if self.joints[0] == self.joints[1]:
raise ValueError("joint_pair_distance requires two distinct joints")
if self.joints[0] < 0 or self.joints[1] < 0:
raise ValueError("joint indices must be non-negative")
return self
class JointSpeedExtractor(BaseModel):
"""Signal = frame-to-frame displacement magnitude of one joint.
The first frame is padded with a 0 so the signal has the same length
as the input pose sequence. Useful when a repetition is characterised
by "the joint moves fast, then rests, then moves fast again" rather
than by any particular coordinate value.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
kind: Literal["joint_speed"] = "joint_speed"
joint: int = Field(ge=0)
class JointAngleExtractor(BaseModel):
"""Signal = angle at joint ``b`` formed by the triplet ``(a, b, c)``.
Computed in radians in ``[0, pi]`` using
:func:`neuropose.analyzer.features.extract_joint_angles`. This is the
most translation- and rotation-invariant of the built-in extractors
and the natural choice for clinically meaningful metrics like knee
or elbow flexion.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
kind: Literal["joint_angle"] = "joint_angle"
triplet: tuple[int, int, int] = Field(
description="``(a, b, c)`` joint indices; angle is computed at ``b``."
)
@model_validator(mode="after")
def _triplet_non_negative(self) -> JointAngleExtractor:
if any(idx < 0 for idx in self.triplet):
raise ValueError("joint indices must be non-negative")
return self
ExtractorSpec = Annotated[
JointAxisExtractor | JointPairDistanceExtractor | JointSpeedExtractor | JointAngleExtractor,
Field(discriminator="kind"),
]
class SegmentationConfig(BaseModel):
"""Parameters that define a single segmentation pass.
The full config is serialized alongside the segments it produced so
that a reader of ``results.json`` can tell a year later, without
access to the code that wrote it exactly which joint was tracked
and which thresholds were applied. The ``method`` field is a version
stamp: if the segmentation algorithm changes, add a new literal here
and dispatch on it in the engine rather than silently reinterpreting
old files.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
extractor: ExtractorSpec
person_index: int = Field(
default=0,
ge=0,
description="Which detected person to extract from each frame.",
)
min_distance_seconds: float | None = Field(
default=None,
ge=0.0,
description="Minimum time between successive repetition peaks.",
)
min_prominence: float | None = Field(
default=None,
description="Minimum scipy-style peak prominence on the raw signal.",
)
min_height: float | None = Field(
default=None,
description="Minimum signal value for a point to be considered a peak.",
)
pad_seconds: float = Field(
default=0.0,
ge=0.0,
description="Amount of time to extend each segment on both sides.",
)
method: Literal["valley_to_valley_v1"] = Field(
default="valley_to_valley_v1",
description="Name and version of the segmentation algorithm used.",
)
class Segment(BaseModel):
"""A single repetition window inside a pose sequence.
Frame indices are integers into the ``VideoPredictions.frames``
insertion order. ``start`` is inclusive, ``end`` is exclusive (Python
slice convention), and ``peak`` is the apex of the repetition inside
``[start, end)``.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
start: int = Field(ge=0, description="Inclusive start frame index.")
end: int = Field(gt=0, description="Exclusive end frame index.")
peak: int = Field(ge=0, description="Frame index of the repetition's apex.")
@model_validator(mode="after")
def _check_ordering(self) -> Segment:
if self.end <= self.start:
raise ValueError(f"end ({self.end}) must be > start ({self.start})")
if not (self.start <= self.peak < self.end):
raise ValueError(
f"peak ({self.peak}) must be in [start, end) = [{self.start}, {self.end})"
)
return self
class Segmentation(BaseModel):
"""A labelled segmentation of a single video.
Pairs the :class:`SegmentationConfig` that produced the segments with
the segments themselves, so the serialized form is self-describing.
Multiple named :class:`Segmentation` objects can coexist on a single
:class:`VideoPredictions` via its ``segmentations`` field typically
one per clinical task or per operator-chosen strategy.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
config: SegmentationConfig
segments: list[Segment]
class VideoPredictions(BaseModel):
"""Per-frame predictions for a single video, paired with video metadata.
The ``frames`` mapping is keyed by frame identifier (``frame_<index>`` by
convention, zero-padded to 6 digits). The identifier is a stable string,
not a filesystem path no PNG file is implied.
The optional ``segmentations`` mapping carries one or more post-hoc
:class:`Segmentation` objects keyed by operator-chosen name (e.g.
``"cup_lift"``). Downstream analysis code that expects rep-level
windows looks here. The field defaults to empty, so inference output
written before segmentation round-trips through this schema unchanged.
"""
model_config = ConfigDict(extra="forbid", frozen=True)
metadata: VideoMetadata
frames: dict[str, FramePrediction]
segmentations: dict[str, Segmentation] = Field(default_factory=dict)
def frame_names(self) -> list[str]:
"""Return frame identifiers in insertion order."""
@ -195,6 +579,19 @@ def save_job_results(path: Path, results: JobResults) -> None:
_write_json_atomic(path, results.model_dump(mode="json"))
def load_benchmark_result(path: Path) -> BenchmarkResult:
"""Load and validate a benchmark-result JSON file."""
with path.open("r", encoding="utf-8") as f:
data: Any = json.load(f)
return BenchmarkResult.model_validate(data)
def save_benchmark_result(path: Path, result: BenchmarkResult) -> None:
"""Serialize a :class:`BenchmarkResult` to a JSON file atomically."""
path.parent.mkdir(parents=True, exist_ok=True)
_write_json_atomic(path, result.model_dump(mode="json"))
def load_status(path: Path) -> StatusFile:
"""Load the persistent job status file.

View File

@ -0,0 +1,60 @@
"""Drift test for the hardcoded berkeley_mhad_43 joint-name constant.
:mod:`neuropose.analyzer.segment` ships ``JOINT_NAMES`` as a frozen
tuple of 43 strings so that post-processing callers can resolve
``"rwri"`` index without loading a multi-gigabyte TensorFlow model.
That is only safe while the hardcoded tuple actually matches what the
pinned MeTRAbs SavedModel reports.
This test loads the real model via :func:`neuropose._model.load_metrabs_model`
and asserts that ``JOINT_NAMES`` is byte-identical to
``model.per_skeleton_joint_names["berkeley_mhad_43"].numpy().astype(str)``.
If MeTRAbs ever ships a new skeleton under the same name or if we bump
the model pin to one whose ``berkeley_mhad_43`` skeleton is spelled
differently this test is the drift detector.
Like every test under ``tests/integration/`` the file is marked
``@pytest.mark.slow`` and only runs under ``pytest --runslow``.
"""
from __future__ import annotations
from pathlib import Path
import pytest
from neuropose._model import load_metrabs_model
from neuropose.analyzer.segment import JOINT_NAMES
pytestmark = pytest.mark.slow
@pytest.fixture(scope="module")
def metrabs_model_cache_dir(tmp_path_factory: pytest.TempPathFactory) -> Path:
"""Module-scoped cache dir so the model downloads at most once per run.
Function scope would re-download on every test; session scope would
collide with the estimator smoke-test cache. Module scope is the
right middle ground for a file that only needs the model loaded
once.
"""
return tmp_path_factory.mktemp("neuropose_joint_names_model_cache")
def test_joint_names_match_pinned_model(metrabs_model_cache_dir: Path) -> None:
"""Hardcoded ``JOINT_NAMES`` must match the loaded MeTRAbs skeleton.
If this fails, the expected fix is:
1. Update :data:`neuropose.analyzer.segment.JOINT_NAMES` in the same
commit that bumps the model pin in :mod:`neuropose._model`.
2. Cross-check any CLI or docs that embed hardcoded joint names.
"""
model = load_metrabs_model(cache_dir=metrabs_model_cache_dir)
tensor = model.per_skeleton_joint_names["berkeley_mhad_43"]
model_names = tuple(tensor.numpy().astype(str).tolist())
assert model_names == JOINT_NAMES, (
"JOINT_NAMES drift detected — the hardcoded tuple in "
"neuropose.analyzer.segment no longer matches the MeTRAbs model. "
f"Model reports: {model_names}"
)

View File

@ -0,0 +1,431 @@
"""Tests for :mod:`neuropose.analyzer.segment`.
Three layers of coverage:
- **Layer 1** (:func:`segment_by_peaks`) against synthetic 1D signals
with known peaks and valleys.
- **Layer 2** (:func:`segment_predictions`) against synthetic
:class:`VideoPredictions` fixtures exercising every extractor variant.
- **Slicing** (:func:`slice_predictions`) per-rep round-trip and the
metadata rewrite.
The extractor factories and the discriminated :class:`ExtractorSpec`
union are covered incidentally through Layer 2 (which is the only
layer that cares about the spec shape) plus a handful of targeted
schema tests for the validators (distinct joint indices etc.).
"""
from __future__ import annotations
import itertools
import math
import numpy as np
import pytest
from neuropose.analyzer.segment import (
JOINT_INDEX,
JOINT_NAMES,
extract_signal,
joint_angle,
joint_axis,
joint_index,
joint_pair_distance,
joint_speed,
segment_by_peaks,
segment_predictions,
slice_predictions,
)
from neuropose.io import (
JointAngleExtractor,
JointAxisExtractor,
JointPairDistanceExtractor,
JointSpeedExtractor,
Segment,
Segmentation,
VideoPredictions,
)
NUM_JOINTS = 43
def _triple_hump_signal(num_frames: int = 300) -> np.ndarray:
"""Three non-negative sine humps separated by clear zero-valleys."""
t = np.linspace(0.0, 6.0 * math.pi, num_frames)
return np.maximum(0.0, np.sin(t)) ** 2
def _make_predictions(
signal: np.ndarray,
joint: int,
*,
axis: int = 1,
fps: float = 30.0,
) -> VideoPredictions:
"""Build a VideoPredictions whose ``joint``'s ``axis`` follows ``signal``."""
frames = {}
for i, value in enumerate(signal):
poses = [[[0.0, 0.0, 0.0] for _ in range(NUM_JOINTS)]]
poses[0][joint][axis] = float(value)
frames[f"frame_{i:06d}"] = {
"boxes": [[0.0, 0.0, 1.0, 1.0, 0.9]],
"poses3d": poses,
"poses2d": [[[0.0, 0.0]] * NUM_JOINTS],
}
return VideoPredictions.model_validate(
{
"metadata": {
"frame_count": len(signal),
"fps": fps,
"width": 640,
"height": 480,
},
"frames": frames,
}
)
# ---------------------------------------------------------------------------
# JOINT_NAMES / JOINT_INDEX / joint_index()
# ---------------------------------------------------------------------------
class TestJointNames:
def test_tuple_length_is_43(self) -> None:
assert len(JOINT_NAMES) == 43
def test_index_matches_position(self) -> None:
for idx, name in enumerate(JOINT_NAMES):
assert JOINT_INDEX[name] == idx
def test_joint_index_by_name(self) -> None:
assert joint_index("lwri") == JOINT_NAMES.index("lwri")
assert joint_index("rwri") == JOINT_NAMES.index("rwri")
def test_joint_index_unknown_name(self) -> None:
with pytest.raises(KeyError, match="unknown joint name"):
joint_index("elbow") # deliberately wrong spelling
# ---------------------------------------------------------------------------
# Factory shortcuts
# ---------------------------------------------------------------------------
class TestFactories:
def test_joint_axis_factory(self) -> None:
spec = joint_axis(JOINT_INDEX["lwri"], 1, invert=True)
assert isinstance(spec, JointAxisExtractor)
assert spec.kind == "joint_axis"
assert spec.joint == JOINT_INDEX["lwri"]
assert spec.axis == 1
assert spec.invert is True
def test_joint_pair_distance_factory(self) -> None:
spec = joint_pair_distance(JOINT_INDEX["lwri"], JOINT_INDEX["rwri"])
assert isinstance(spec, JointPairDistanceExtractor)
assert spec.joints == (JOINT_INDEX["lwri"], JOINT_INDEX["rwri"])
def test_joint_pair_distance_rejects_same_joint(self) -> None:
from pydantic import ValidationError
with pytest.raises(ValidationError, match="distinct"):
joint_pair_distance(5, 5)
def test_joint_speed_factory(self) -> None:
spec = joint_speed(JOINT_INDEX["rwri"])
assert isinstance(spec, JointSpeedExtractor)
assert spec.joint == JOINT_INDEX["rwri"]
def test_joint_angle_factory(self) -> None:
spec = joint_angle(
JOINT_INDEX["larm"],
JOINT_INDEX["lelb"],
JOINT_INDEX["lwri"],
)
assert isinstance(spec, JointAngleExtractor)
assert spec.triplet == (
JOINT_INDEX["larm"],
JOINT_INDEX["lelb"],
JOINT_INDEX["lwri"],
)
# ---------------------------------------------------------------------------
# extract_signal: one test per extractor variant
# ---------------------------------------------------------------------------
class TestExtractSignal:
def test_joint_axis_selects_axis(self) -> None:
seq = np.zeros((4, NUM_JOINTS, 3))
seq[:, 10, 1] = [1.0, 2.0, 3.0, 4.0]
signal = extract_signal(seq, joint_axis(10, 1))
np.testing.assert_array_equal(signal, [1.0, 2.0, 3.0, 4.0])
def test_joint_axis_invert(self) -> None:
seq = np.zeros((3, NUM_JOINTS, 3))
seq[:, 0, 0] = [1.0, 2.0, 3.0]
signal = extract_signal(seq, joint_axis(0, 0, invert=True))
np.testing.assert_array_equal(signal, [-1.0, -2.0, -3.0])
def test_joint_pair_distance(self) -> None:
seq = np.zeros((3, NUM_JOINTS, 3))
seq[:, 0, 0] = [0.0, 0.0, 0.0]
seq[:, 1, 0] = [3.0, 6.0, 9.0] # distances 3, 6, 9 along x
signal = extract_signal(seq, joint_pair_distance(0, 1))
np.testing.assert_allclose(signal, [3.0, 6.0, 9.0])
def test_joint_speed_pads_first_frame_with_zero(self) -> None:
seq = np.zeros((4, NUM_JOINTS, 3))
seq[:, 5, 0] = [0.0, 1.0, 3.0, 6.0] # speeds: 1, 2, 3
signal = extract_signal(seq, joint_speed(5))
np.testing.assert_allclose(signal, [0.0, 1.0, 2.0, 3.0])
def test_joint_speed_single_frame_rejected(self) -> None:
seq = np.zeros((1, NUM_JOINTS, 3))
with pytest.raises(ValueError, match="at least two frames"):
extract_signal(seq, joint_speed(0))
def test_joint_angle_straight(self) -> None:
seq = np.zeros((2, NUM_JOINTS, 3))
# Straight line: a=(-1,0,0), b=(0,0,0), c=(1,0,0). Angle = pi.
seq[:, 0, 0] = [-1.0, -1.0]
seq[:, 1, 0] = [0.0, 0.0]
seq[:, 2, 0] = [1.0, 1.0]
signal = extract_signal(seq, joint_angle(0, 1, 2))
np.testing.assert_allclose(signal, [math.pi, math.pi])
def test_joint_angle_right(self) -> None:
seq = np.zeros((1, NUM_JOINTS, 3))
seq[0, 0] = [1.0, 0.0, 0.0]
seq[0, 1] = [0.0, 0.0, 0.0]
seq[0, 2] = [0.0, 1.0, 0.0]
signal = extract_signal(seq, joint_angle(0, 1, 2))
np.testing.assert_allclose(signal, [math.pi / 2])
def test_out_of_range_joint_index(self) -> None:
seq = np.zeros((3, NUM_JOINTS, 3))
with pytest.raises(ValueError, match="out of range"):
extract_signal(seq, joint_axis(999, 0))
def test_bad_sequence_shape(self) -> None:
with pytest.raises(ValueError, match="frames, joints, 3"):
extract_signal(np.zeros((5, 10)), joint_axis(0, 0))
# ---------------------------------------------------------------------------
# Layer 1: segment_by_peaks
# ---------------------------------------------------------------------------
class TestSegmentByPeaks:
def test_three_humps_three_segments(self) -> None:
signal = _triple_hump_signal()
segs = segment_by_peaks(signal, min_prominence=0.1)
assert len(segs) == 3
# Segments should not overlap (in this synthetic case). Adjacent
# segments are allowed to share the valley frame on their boundary,
# so we use a ``>=`` comparison with one frame of slack.
for prev, curr in itertools.pairwise(segs):
assert curr.start >= prev.end - 1
def test_first_segment_starts_at_zero_without_leading_valley(self) -> None:
signal = _triple_hump_signal()
segs = segment_by_peaks(signal, min_prominence=0.1)
assert segs[0].start == 0
def test_last_segment_ends_at_signal_length(self) -> None:
signal = _triple_hump_signal()
segs = segment_by_peaks(signal, min_prominence=0.1)
assert segs[-1].end == len(signal)
def test_peaks_lie_inside_segment(self) -> None:
signal = _triple_hump_signal()
segs = segment_by_peaks(signal, min_prominence=0.1)
for seg in segs:
assert seg.start <= seg.peak < seg.end
def test_no_peaks_returns_empty(self) -> None:
flat = np.zeros(50)
segs = segment_by_peaks(flat)
assert segs == []
def test_min_distance_suppresses_close_peaks(self) -> None:
# A signal with two very close peaks should give only one segment
# when min_distance is large.
signal = np.zeros(100)
signal[20] = 1.0
signal[25] = 1.0
signal[70] = 1.0
segs = segment_by_peaks(signal, min_distance=30)
# Exact count depends on scipy's tie-breaking; main assertion is
# "fewer than if no distance constraint".
segs_unconstrained = segment_by_peaks(signal)
assert len(segs) < len(segs_unconstrained)
def test_pad_extends_segment(self) -> None:
signal = _triple_hump_signal()
base = segment_by_peaks(signal, min_prominence=0.1)
padded = segment_by_peaks(signal, min_prominence=0.1, pad=5)
assert len(base) == len(padded)
for b, p in zip(base, padded, strict=True):
assert p.start <= b.start
assert p.end >= b.end
def test_pad_is_clamped_to_bounds(self) -> None:
signal = _triple_hump_signal()
padded = segment_by_peaks(signal, min_prominence=0.1, pad=10_000)
for seg in padded:
assert seg.start >= 0
assert seg.end <= len(signal)
def test_negative_pad_rejected(self) -> None:
with pytest.raises(ValueError, match="pad"):
segment_by_peaks(np.zeros(10), pad=-1)
def test_rejects_non_1d(self) -> None:
with pytest.raises(ValueError, match="1D"):
segment_by_peaks(np.zeros((5, 5)))
# ---------------------------------------------------------------------------
# Layer 2: segment_predictions
# ---------------------------------------------------------------------------
class TestSegmentPredictions:
def test_returns_segmentation_with_config(self) -> None:
signal = _triple_hump_signal() * 1000.0 # mm scale
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
result = segment_predictions(
preds,
joint_axis(JOINT_INDEX["lwri"], 1),
min_prominence=50.0,
)
assert isinstance(result, Segmentation)
assert len(result.segments) == 3
assert result.config.extractor.kind == "joint_axis"
assert result.config.min_prominence == 50.0
assert result.config.method == "valley_to_valley_v1"
def test_min_distance_seconds_converts_via_fps(self) -> None:
# 300 frames at 30 fps = 10 seconds; humps are ~3.3 s apart.
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"], fps=30.0)
# A 5-second minimum distance should collapse the three humps
# into at most two segments.
result = segment_predictions(
preds,
joint_axis(JOINT_INDEX["lwri"], 1),
min_prominence=50.0,
min_distance_seconds=5.0,
)
assert len(result.segments) <= 2
def test_pad_seconds_extends_segments(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"], fps=30.0)
plain = segment_predictions(preds, joint_axis(JOINT_INDEX["lwri"], 1), min_prominence=50.0)
padded = segment_predictions(
preds,
joint_axis(JOINT_INDEX["lwri"], 1),
min_prominence=50.0,
pad_seconds=0.2, # ~6 frames at 30 fps
)
# At least one segment should have moved outward.
assert any(
p.start < b.start or p.end > b.end
for p, b in zip(padded.segments, plain.segments, strict=True)
)
def test_requires_fps_when_time_params_used(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"], fps=0.0)
with pytest.raises(ValueError, match="fps"):
segment_predictions(
preds,
joint_axis(JOINT_INDEX["lwri"], 1),
min_prominence=50.0,
min_distance_seconds=1.0,
)
def test_no_fps_is_fine_without_time_params(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"], fps=0.0)
# Without any time-based parameters we never need to multiply by
# fps, so fps=0 is tolerated.
result = segment_predictions(
preds,
joint_axis(JOINT_INDEX["lwri"], 1),
min_prominence=50.0,
)
assert len(result.segments) == 3
def test_config_roundtrips_through_json(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
result = segment_predictions(
preds,
joint_pair_distance(JOINT_INDEX["lwri"], JOINT_INDEX["rwri"]),
min_prominence=10.0,
)
serialized = result.model_dump(mode="json")
rehydrated = Segmentation.model_validate(serialized)
assert rehydrated == result
# ---------------------------------------------------------------------------
# slice_predictions
# ---------------------------------------------------------------------------
class TestSlicePredictions:
def test_one_output_per_segment(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
result = segment_predictions(preds, joint_axis(JOINT_INDEX["lwri"], 1), min_prominence=50.0)
slices = slice_predictions(preds, result.segments)
assert len(slices) == len(result.segments)
def test_metadata_frame_count_matches_segment_length(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
segments = [Segment(start=10, end=30, peak=20), Segment(start=50, end=90, peak=75)]
slices = slice_predictions(preds, segments)
assert slices[0].metadata.frame_count == 20
assert slices[1].metadata.frame_count == 40
def test_frames_are_rekeyed_from_zero(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
segments = [Segment(start=100, end=110, peak=105)]
sliced = slice_predictions(preds, segments)[0]
assert sliced.frame_names()[0] == "frame_000000"
assert sliced.frame_names()[-1] == "frame_000009"
def test_sliced_segmentations_field_is_empty(self) -> None:
# Parent has a segmentation attached; sliced copies intentionally
# drop it because segment indices are only meaningful in the
# parent's timeline.
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
result = segment_predictions(preds, joint_axis(JOINT_INDEX["lwri"], 1), min_prominence=50.0)
parent = preds.model_copy(update={"segmentations": {"cup_lift": result}})
slices = slice_predictions(parent, result.segments)
assert all(s.segmentations == {} for s in slices)
def test_out_of_bounds_segment_rejected(self) -> None:
signal = _triple_hump_signal(num_frames=50) * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
bad = [Segment(start=0, end=100, peak=25)] # end > 50
with pytest.raises(ValueError, match="exceeds frame count"):
slice_predictions(preds, bad)
def test_slice_preserves_pose_values(self) -> None:
signal = _triple_hump_signal() * 1000.0
preds = _make_predictions(signal, joint=JOINT_INDEX["lwri"])
segments = [Segment(start=50, end=55, peak=52)]
sliced = slice_predictions(preds, segments)[0]
# frame_000000 of the slice must equal frame_000050 of the source
assert sliced["frame_000000"].poses3d == preds["frame_000050"].poses3d

View File

@ -0,0 +1,314 @@
"""Tests for :mod:`neuropose.benchmark`.
Coverage:
- :func:`run_benchmark` against a fake-model :class:`Estimator`
repeats, warmup-pass discarding, capture-reference, edge cases.
- Aggregate statistics (mean / p50 / p95 / p99 / throughput / peak RSS)
on synthetic metrics.
- :func:`compute_poses3d_divergence` on matched and mismatched
prediction pairs.
- :func:`format_benchmark_report` rendering.
"""
from __future__ import annotations
from pathlib import Path
import pytest
from neuropose.benchmark import (
compute_poses3d_divergence,
format_benchmark_report,
run_benchmark,
)
from neuropose.estimator import Estimator
from neuropose.io import (
BenchmarkResult,
PerformanceMetrics,
VideoPredictions,
)
class TestRunBenchmark:
def test_returns_benchmark_result(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
assert isinstance(outcome.result, BenchmarkResult)
assert outcome.result.video_name == synthetic_video.name
def test_repeats_reflected_in_result(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=4, warmup_frames=0)
assert outcome.result.repeats == 4
# One pass is always discarded as warmup; the rest are measured.
assert len(outcome.result.measured_passes) == 3
assert outcome.result.aggregate.repeats_measured == 3
def test_first_pass_is_the_warmup(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
assert outcome.result.warmup_pass is not None
# warmup_pass is a separate object from the measured passes.
assert outcome.result.warmup_pass not in outcome.result.measured_passes
def test_capture_reference_off_by_default(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
assert outcome.reference_predictions is None
def test_capture_reference_returns_last_measured_pass(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(
estimator,
synthetic_video,
repeats=3,
warmup_frames=0,
capture_reference=True,
)
assert isinstance(outcome.reference_predictions, VideoPredictions)
assert len(outcome.reference_predictions) == 5
def test_rejects_repeats_below_two(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
with pytest.raises(ValueError, match="repeats"):
run_benchmark(estimator, synthetic_video, repeats=1, warmup_frames=0)
def test_rejects_negative_warmup_frames(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
with pytest.raises(ValueError, match="warmup_frames"):
run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=-1)
def test_warmup_frames_filters_aggregate_head(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
# synthetic_video has 5 frames; warmup_frames=4 leaves only
# the last frame of each measured pass to contribute.
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=4)
# The stats should still compute, not error.
assert outcome.result.aggregate.mean_frame_latency_ms >= 0.0
def test_warmup_frames_exceeding_pass_length_gives_zero_latencies(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
# 5-frame video, warmup_frames=10 → every frame discarded.
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=10)
# Latency aggregate drops to zero; throughput is still
# computed from total wall clock.
assert outcome.result.aggregate.mean_frame_latency_ms == 0.0
assert outcome.result.aggregate.p95_frame_latency_ms == 0.0
class TestBenchmarkAggregate:
def test_percentiles_ordered_correctly(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=4, warmup_frames=0)
agg = outcome.result.aggregate
assert agg.p50_frame_latency_ms <= agg.p95_frame_latency_ms
assert agg.p95_frame_latency_ms <= agg.p99_frame_latency_ms
def test_throughput_non_negative(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
assert outcome.result.aggregate.mean_throughput_fps > 0.0
def test_peak_rss_matches_max_across_passes(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
individual = [p.peak_rss_mb for p in outcome.result.measured_passes]
assert outcome.result.aggregate.peak_rss_mb_max == max(individual)
class TestComputePoses3dDivergence:
def _make_predictions(self, poses3d_values: list[float]) -> VideoPredictions:
"""Build a 3-frame VideoPredictions whose single joint has the given y-values."""
frames: dict[str, dict] = {}
for i, value in enumerate(poses3d_values):
frames[f"frame_{i:06d}"] = {
"boxes": [[0.0, 0.0, 1.0, 1.0, 0.9]],
"poses3d": [[[0.0, float(value), 0.0]]],
"poses2d": [[[0.0, 0.0]]],
}
return VideoPredictions.model_validate(
{
"metadata": {
"frame_count": len(poses3d_values),
"fps": 30.0,
"width": 64,
"height": 64,
},
"frames": frames,
}
)
def test_identical_predictions_zero_divergence(self) -> None:
a = self._make_predictions([1.0, 2.0, 3.0])
b = self._make_predictions([1.0, 2.0, 3.0])
max_diff, compared = compute_poses3d_divergence(a, b)
assert max_diff == 0.0
assert compared == 3
def test_divergence_reports_max_abs_diff(self) -> None:
a = self._make_predictions([1.0, 2.0, 3.0])
b = self._make_predictions([1.0, 2.1, 2.5]) # diffs 0, 0.1, 0.5
max_diff, compared = compute_poses3d_divergence(a, b)
assert max_diff == pytest.approx(0.5)
assert compared == 3
def test_mismatched_frame_count_raises(self) -> None:
a = self._make_predictions([1.0, 2.0])
b = self._make_predictions([1.0, 2.0, 3.0])
with pytest.raises(ValueError, match="frame count"):
compute_poses3d_divergence(a, b)
def test_mismatched_detection_count_skipped_not_raised(self) -> None:
a = self._make_predictions([1.0, 2.0])
# Frame 1 has zero detections instead of one.
b_payload = {
"metadata": {
"frame_count": 2,
"fps": 30.0,
"width": 64,
"height": 64,
},
"frames": {
"frame_000000": {
"boxes": [[0, 0, 1, 1, 0.9]],
"poses3d": [[[0.0, 1.0, 0.0]]],
"poses2d": [[[0.0, 0.0]]],
},
"frame_000001": {
"boxes": [],
"poses3d": [],
"poses2d": [],
},
},
}
b = VideoPredictions.model_validate(b_payload)
max_diff, compared = compute_poses3d_divergence(a, b)
# First frame contributes 0 divergence; second frame is skipped.
assert max_diff == 0.0
assert compared == 1
class TestFormatBenchmarkReport:
def _fake_metrics(self, latencies_ms: list[float]) -> PerformanceMetrics:
return PerformanceMetrics(
model_load_seconds=12.3,
total_seconds=sum(latencies_ms) / 1000.0 + 0.01,
per_frame_latencies_ms=latencies_ms,
peak_rss_mb=512.0,
active_device="/CPU:0",
tensorflow_metal_active=False,
tensorflow_version="2.21.0",
)
def test_report_mentions_key_fields(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
text = format_benchmark_report(outcome.result)
assert "Benchmark:" in text
assert "device:" in text
assert "mean:" in text
assert "p50:" in text
assert "p95:" in text
assert "p99:" in text
assert "Throughput:" in text
assert "Peak RSS:" in text
def test_report_mentions_metal_when_active(self) -> None:
from neuropose.io import BenchmarkAggregate, BenchmarkResult
metrics = PerformanceMetrics(
model_load_seconds=None,
total_seconds=1.0,
per_frame_latencies_ms=[10.0, 11.0],
peak_rss_mb=100.0,
active_device="/GPU:0",
tensorflow_metal_active=True,
tensorflow_version="2.21.0",
)
agg = BenchmarkAggregate(
repeats_measured=1,
warmup_frames_per_pass=0,
mean_frame_latency_ms=10.5,
p50_frame_latency_ms=10.5,
p95_frame_latency_ms=11.0,
p99_frame_latency_ms=11.0,
stddev_frame_latency_ms=0.5,
mean_throughput_fps=95.0,
peak_rss_mb_max=100.0,
active_device="/GPU:0",
tensorflow_metal_active=True,
tensorflow_version="2.21.0",
)
result = BenchmarkResult(
video_name="test.mp4",
repeats=2,
warmup_frames=0,
warmup_pass=metrics,
measured_passes=[metrics],
aggregate=agg,
)
text = format_benchmark_report(result)
assert "tensorflow-metal" in text
def test_report_model_load_injected_label(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
outcome = run_benchmark(estimator, synthetic_video, repeats=3, warmup_frames=0)
text = format_benchmark_report(outcome.result)
assert "injected" in text # fake model was not loaded via load_model

View File

@ -14,8 +14,10 @@ robust.
from __future__ import annotations
import math
from pathlib import Path
import numpy as np
import pytest
import yaml
from typer.testing import CliRunner
@ -28,6 +30,15 @@ from neuropose.cli import (
EXIT_USAGE,
app,
)
from neuropose.io import (
JobResults,
VideoPredictions,
load_benchmark_result,
load_job_results,
load_video_predictions,
save_job_results,
save_video_predictions,
)
@pytest.fixture
@ -57,9 +68,7 @@ def stub_metrabs_loader(monkeypatch: pytest.MonkeyPatch) -> None:
def fake_loader(cache_dir: Path | None = None) -> object:
del cache_dir
raise NotImplementedError(
"pending commit 11: MeTRAbs loader stubbed for unit testing"
)
raise NotImplementedError("pending commit 11: MeTRAbs loader stubbed for unit testing")
monkeypatch.setattr("neuropose.estimator.load_metrabs_model", fake_loader)
@ -83,6 +92,8 @@ class TestTopLevelOptions:
assert result.exit_code in (EXIT_OK, EXIT_USAGE)
assert "watch" in result.output
assert "process" in result.output
assert "segment" in result.output
assert "benchmark" in result.output
assert "analyze" in result.output
def test_help_flag(self, runner: CliRunner) -> None:
@ -91,7 +102,7 @@ class TestTopLevelOptions:
assert "NeuroPose" in result.output
def test_subcommand_help(self, runner: CliRunner) -> None:
for subcommand in ("watch", "process", "analyze"):
for subcommand in ("watch", "process", "segment", "benchmark", "analyze"):
result = runner.invoke(app, [subcommand, "--help"])
assert result.exit_code == EXIT_OK, f"{subcommand} --help failed"
@ -204,6 +215,393 @@ class TestProcess:
assert "commit 11" in result.output
# ---------------------------------------------------------------------------
# segment
# ---------------------------------------------------------------------------
_NUM_JOINTS = 43
def _triple_hump_predictions(joint_name: str = "lwri") -> VideoPredictions:
"""Build a 300-frame synthetic VideoPredictions with three clear humps."""
from neuropose.analyzer.segment import JOINT_INDEX
joint = JOINT_INDEX[joint_name]
t = np.linspace(0.0, 6.0 * math.pi, 300)
signal = np.maximum(0.0, np.sin(t)) ** 2 * 1000.0
frames = {}
for i, value in enumerate(signal):
poses = [[[0.0, 0.0, 0.0] for _ in range(_NUM_JOINTS)]]
poses[0][joint][1] = float(value)
frames[f"frame_{i:06d}"] = {
"boxes": [[0.0, 0.0, 1.0, 1.0, 0.9]],
"poses3d": poses,
"poses2d": [[[0.0, 0.0]] * _NUM_JOINTS],
}
return VideoPredictions.model_validate(
{
"metadata": {
"frame_count": 300,
"fps": 30.0,
"width": 640,
"height": 480,
},
"frames": frames,
}
)
class TestSegmentSubcommand:
def test_segment_job_results_in_place(self, runner: CliRunner, tmp_path: Path) -> None:
path = tmp_path / "results.json"
save_job_results(
path,
JobResults(root={"trial_01.mp4": _triple_hump_predictions()}),
)
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"lwri",
"--axis",
"1",
"--min-prominence",
"50",
],
)
assert result.exit_code == EXIT_OK, result.output
loaded = load_job_results(path)
vp = loaded["trial_01.mp4"]
assert "cup_lift" in vp.segmentations
assert len(vp.segmentations["cup_lift"].segments) == 3
def test_segment_single_predictions_file(self, runner: CliRunner, tmp_path: Path) -> None:
path = tmp_path / "video_predictions.json"
save_video_predictions(path, _triple_hump_predictions())
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_pair_distance",
"--joints",
"lwri,rwri",
"--min-prominence",
"50",
],
)
# With both wrists at origin except for lwri's y-coordinate, the
# pair distance is exactly the lwri.y signal, so the three humps
# are detected the same way as the joint_axis case.
assert result.exit_code == EXIT_OK, result.output
loaded = load_video_predictions(path)
assert "cup_lift" in loaded.segmentations
assert len(loaded.segmentations["cup_lift"].segments) == 3
def test_segment_writes_to_output_option(self, runner: CliRunner, tmp_path: Path) -> None:
src = tmp_path / "src.json"
dst = tmp_path / "dst.json"
save_job_results(src, JobResults(root={"trial_01.mp4": _triple_hump_predictions()}))
result = runner.invoke(
app,
[
"segment",
str(src),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"15", # integer form
"--axis",
"1",
"--min-prominence",
"50",
"--output",
str(dst),
],
)
assert result.exit_code == EXIT_OK, result.output
assert dst.exists()
# Source must be left untouched when --output is given.
src_loaded = load_job_results(src)
assert src_loaded["trial_01.mp4"].segmentations == {}
dst_loaded = load_job_results(dst)
assert "cup_lift" in dst_loaded["trial_01.mp4"].segmentations
def test_segment_rejects_collision_without_force(
self, runner: CliRunner, tmp_path: Path
) -> None:
path = tmp_path / "results.json"
save_job_results(path, JobResults(root={"trial_01.mp4": _triple_hump_predictions()}))
base_args = [
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"lwri",
"--axis",
"1",
"--min-prominence",
"50",
]
first = runner.invoke(app, base_args)
assert first.exit_code == EXIT_OK, first.output
collision = runner.invoke(app, base_args)
assert collision.exit_code == EXIT_USAGE
assert "force" in collision.output.lower()
def test_segment_force_overwrites(self, runner: CliRunner, tmp_path: Path) -> None:
path = tmp_path / "results.json"
save_job_results(path, JobResults(root={"trial_01.mp4": _triple_hump_predictions()}))
args = [
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"lwri",
"--axis",
"1",
"--min-prominence",
"50",
]
runner.invoke(app, args)
result = runner.invoke(app, [*args, "--force"])
assert result.exit_code == EXIT_OK, result.output
def test_segment_unknown_joint_name_is_usage_error(
self, runner: CliRunner, tmp_path: Path
) -> None:
path = tmp_path / "results.json"
save_job_results(path, JobResults(root={"trial_01.mp4": _triple_hump_predictions()}))
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"elbow", # deliberately not a valid berkeley_mhad_43 name
"--axis",
"1",
],
)
assert result.exit_code == EXIT_USAGE
assert "unknown joint" in result.output.lower()
def test_segment_missing_required_flag_is_usage_error(
self, runner: CliRunner, tmp_path: Path
) -> None:
path = tmp_path / "results.json"
save_job_results(path, JobResults(root={"trial_01.mp4": _triple_hump_predictions()}))
# joint_axis without --joint
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--axis",
"1",
],
)
assert result.exit_code == EXIT_USAGE
def test_segment_unreadable_file_is_usage_error(
self, runner: CliRunner, tmp_path: Path
) -> None:
path = tmp_path / "garbage.json"
path.write_text("{ this is not valid json")
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"lwri",
"--axis",
"1",
],
)
assert result.exit_code == EXIT_USAGE
def test_segment_preserves_pose_values(self, runner: CliRunner, tmp_path: Path) -> None:
"""Segmentation only touches ``segmentations``; pose data is untouched."""
path = tmp_path / "results.json"
original = _triple_hump_predictions()
save_job_results(path, JobResults(root={"trial_01.mp4": original}))
result = runner.invoke(
app,
[
"segment",
str(path),
"--name",
"cup_lift",
"--extractor",
"joint_axis",
"--joint",
"lwri",
"--axis",
"1",
"--min-prominence",
"50",
],
)
assert result.exit_code == EXIT_OK, result.output
loaded = load_job_results(path)
lv = loaded["trial_01.mp4"]
assert lv.frame_names() == original.frame_names()
for name in original.frame_names():
assert lv[name].poses3d == original[name].poses3d
# ---------------------------------------------------------------------------
# benchmark
# ---------------------------------------------------------------------------
@pytest.fixture
def stub_estimator_with_metrics(monkeypatch: pytest.MonkeyPatch):
"""Monkeypatch the benchmark's Estimator path to use a fake model.
The CLI's ``benchmark`` subcommand instantiates an :class:`Estimator`
and calls ``load_model``. We replace ``load_metrabs_model`` with a
fake that returns a deterministic stand-in so the CLI's
``load_model`` succeeds without downloading or touching TF. The
estimator's own metrics-collection path still runs, so the CLI
exercise is end-to-end except for the real model.
"""
import numpy as np
class RecordingFake:
def detect_poses(self, image, **kwargs):
del image, kwargs
return {
"boxes": np.array([[0.0, 0.0, 1.0, 1.0, 0.9]]),
"poses3d": np.array([[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]]),
"poses2d": np.array([[[0.0, 0.0], [1.0, 1.0]]]),
}
def fake_loader(cache_dir: Path | None = None) -> object:
del cache_dir
return RecordingFake()
monkeypatch.setattr("neuropose.estimator.load_metrabs_model", fake_loader)
class TestBenchmarkSubcommand:
def test_benchmark_smoke(
self,
runner: CliRunner,
synthetic_video: Path,
tmp_path: Path,
stub_estimator_with_metrics,
) -> None:
del stub_estimator_with_metrics
output = tmp_path / "bench.json"
result = runner.invoke(
app,
[
"benchmark",
str(synthetic_video),
"--repeats",
"3",
"--warmup-frames",
"0",
"--output",
str(output),
],
)
assert result.exit_code == EXIT_OK, result.output
# Human-readable report must hit stdout.
assert "Benchmark:" in result.output
assert "Throughput:" in result.output
# JSON output file must exist and validate against the schema.
assert output.exists()
loaded = load_benchmark_result(output)
assert loaded.video_name == synthetic_video.name
assert loaded.repeats == 3
assert len(loaded.measured_passes) == 2
assert loaded.cpu_comparison is None
def test_benchmark_rejects_repeats_below_two(
self,
runner: CliRunner,
synthetic_video: Path,
stub_estimator_with_metrics,
) -> None:
del stub_estimator_with_metrics
result = runner.invoke(app, ["benchmark", str(synthetic_video), "--repeats", "1"])
# Typer's min=2 validation catches this before our code runs.
assert result.exit_code != EXIT_OK
def test_benchmark_missing_video_is_usage_error(
self,
runner: CliRunner,
tmp_path: Path,
) -> None:
result = runner.invoke(app, ["benchmark", str(tmp_path / "nope.mp4")])
assert result.exit_code == EXIT_USAGE
def test_benchmark_force_cpu_and_compare_cpu_are_mutually_exclusive(
self,
runner: CliRunner,
synthetic_video: Path,
stub_estimator_with_metrics,
) -> None:
del stub_estimator_with_metrics
result = runner.invoke(
app,
[
"benchmark",
str(synthetic_video),
"--compare-cpu",
"--force-cpu",
],
)
assert result.exit_code == EXIT_USAGE
assert "mutually exclusive" in result.output
# ---------------------------------------------------------------------------
# analyze
# ---------------------------------------------------------------------------

View File

@ -19,7 +19,7 @@ from neuropose.estimator import (
ProcessVideoResult,
VideoDecodeError,
)
from neuropose.io import FramePrediction, VideoPredictions
from neuropose.io import FramePrediction, PerformanceMetrics, VideoPredictions
class TestConstruction:
@ -199,6 +199,119 @@ class TestProcessVideo:
assert len(fov_seen) == 5
class TestPerformanceMetrics:
def test_metrics_attached_to_result(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
assert isinstance(result.metrics, PerformanceMetrics)
def test_per_frame_latencies_length_matches_frames(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
assert len(result.metrics.per_frame_latencies_ms) == result.frame_count
def test_all_latencies_are_non_negative(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
assert all(v >= 0.0 for v in result.metrics.per_frame_latencies_ms)
def test_total_seconds_is_positive(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
assert result.metrics.total_seconds > 0.0
def test_peak_rss_is_positive(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
# psutil always reports at least a few MB of RSS for a running
# Python process; the exact number varies by platform.
assert result.metrics.peak_rss_mb > 0.0
def test_model_load_seconds_none_when_injected(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
assert result.metrics.model_load_seconds is None
def test_model_load_seconds_set_after_load(
self,
synthetic_video: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""``load_model`` should set ``model_load_seconds`` on the next call.
We stub the loader to return a recording fake model, time how long
the estimator's ``load_model`` takes, and verify the number ends
up on the metrics object.
"""
import numpy as np
class Recorder:
def detect_poses(self, image, **kwargs):
del image, kwargs
return {
"boxes": np.array([[0.0, 0.0, 32.0, 32.0, 0.9]]),
"poses3d": np.array([[[0.0, 0.0, 0.0]]]),
"poses2d": np.array([[[0.0, 0.0]]]),
}
def fake_loader(cache_dir: Path | None = None) -> object:
del cache_dir
return Recorder()
monkeypatch.setattr("neuropose.estimator.load_metrabs_model", fake_loader)
estimator = Estimator()
estimator.load_model()
result = estimator.process_video(synthetic_video)
assert result.metrics.model_load_seconds is not None
assert result.metrics.model_load_seconds >= 0.0
def test_active_device_string_populated(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
# The exact string depends on the runner's TF install, but it
# must be one of the two canonical forms.
assert result.metrics.active_device in {"/CPU:0", "/GPU:0", "unknown"}
def test_tensorflow_version_populated(
self,
synthetic_video: Path,
fake_metrabs_model,
) -> None:
estimator = Estimator(model=fake_metrabs_model)
result = estimator.process_video(synthetic_video)
# TF is in the dev deps so the version should always be a real
# string, not the "unknown" fallback.
assert result.metrics.tensorflow_version not in {"", "unknown"}
class TestErrors:
def test_missing_video(
self,

View File

@ -10,15 +10,28 @@ import pytest
from pydantic import ValidationError
from neuropose.io import (
BenchmarkAggregate,
BenchmarkResult,
CpuComparisonResult,
FramePrediction,
JobResults,
JobStatus,
JointAngleExtractor,
JointAxisExtractor,
JointPairDistanceExtractor,
JointSpeedExtractor,
PerformanceMetrics,
Segment,
Segmentation,
SegmentationConfig,
StatusFile,
VideoMetadata,
VideoPredictions,
load_benchmark_result,
load_job_results,
load_status,
load_video_predictions,
save_benchmark_result,
save_job_results,
save_status,
save_video_predictions,
@ -188,6 +201,239 @@ class TestJobResults:
assert len(loaded["video_a.mp4"]) == 2
# ---------------------------------------------------------------------------
# Performance / benchmark schemas
# ---------------------------------------------------------------------------
def _make_metrics(
*,
total_seconds: float = 1.0,
latencies: list[float] | None = None,
peak_rss_mb: float = 512.0,
active_device: str = "/CPU:0",
metal_active: bool = False,
model_load_seconds: float | None = None,
) -> PerformanceMetrics:
return PerformanceMetrics(
model_load_seconds=model_load_seconds,
total_seconds=total_seconds,
per_frame_latencies_ms=latencies if latencies is not None else [10.0, 11.0, 9.5],
peak_rss_mb=peak_rss_mb,
active_device=active_device,
tensorflow_metal_active=metal_active,
tensorflow_version="2.21.0",
)
def _make_aggregate() -> BenchmarkAggregate:
return BenchmarkAggregate(
repeats_measured=4,
warmup_frames_per_pass=3,
mean_frame_latency_ms=10.0,
p50_frame_latency_ms=9.8,
p95_frame_latency_ms=12.5,
p99_frame_latency_ms=13.0,
stddev_frame_latency_ms=0.7,
mean_throughput_fps=100.0,
peak_rss_mb_max=512.0,
active_device="/CPU:0",
tensorflow_metal_active=False,
tensorflow_version="2.21.0",
)
class TestPerformanceMetricsModel:
def test_roundtrip(self) -> None:
m = _make_metrics()
rehydrated = PerformanceMetrics.model_validate(m.model_dump(mode="json"))
assert rehydrated == m
def test_rejects_negative_total_seconds(self) -> None:
with pytest.raises(ValidationError):
PerformanceMetrics(
total_seconds=-1.0,
peak_rss_mb=0.0,
active_device="/CPU:0",
tensorflow_version="2.21.0",
)
def test_rejects_negative_peak_rss(self) -> None:
with pytest.raises(ValidationError):
PerformanceMetrics(
total_seconds=1.0,
peak_rss_mb=-5.0,
active_device="/CPU:0",
tensorflow_version="2.21.0",
)
def test_model_load_seconds_optional(self) -> None:
m = _make_metrics(model_load_seconds=None)
assert m.model_load_seconds is None
def test_is_frozen(self) -> None:
m = _make_metrics()
with pytest.raises(ValidationError):
m.total_seconds = 2.0
class TestBenchmarkResultPersistence:
def test_roundtrip_to_disk(self, tmp_path: Path) -> None:
result = BenchmarkResult(
video_name="trial.mp4",
repeats=5,
warmup_frames=3,
warmup_pass=_make_metrics(total_seconds=20.0),
measured_passes=[_make_metrics(total_seconds=1.5) for _ in range(4)],
aggregate=_make_aggregate(),
)
path = tmp_path / "bench.json"
save_benchmark_result(path, result)
assert path.exists()
loaded = load_benchmark_result(path)
assert loaded == result
def test_rejects_repeats_below_one(self) -> None:
with pytest.raises(ValidationError):
BenchmarkResult(
video_name="x.mp4",
repeats=0,
warmup_frames=0,
warmup_pass=_make_metrics(),
measured_passes=[],
aggregate=_make_aggregate(),
)
def test_cpu_comparison_nested(self, tmp_path: Path) -> None:
comparison = CpuComparisonResult(
primary_aggregate=_make_aggregate(),
cpu_aggregate=_make_aggregate(),
speedup=2.5,
max_poses3d_divergence_mm=0.002,
frame_count_compared=30,
)
result = BenchmarkResult(
video_name="trial.mp4",
repeats=5,
warmup_frames=3,
warmup_pass=_make_metrics(),
measured_passes=[_make_metrics() for _ in range(4)],
aggregate=_make_aggregate(),
cpu_comparison=comparison,
)
path = tmp_path / "bench_with_cmp.json"
save_benchmark_result(path, result)
loaded = load_benchmark_result(path)
assert loaded.cpu_comparison is not None
assert loaded.cpu_comparison.speedup == pytest.approx(2.5)
assert loaded.cpu_comparison.max_poses3d_divergence_mm == pytest.approx(0.002)
# ---------------------------------------------------------------------------
# Segmentation schema
# ---------------------------------------------------------------------------
class TestSegmentModel:
def test_valid(self) -> None:
seg = Segment(start=0, end=30, peak=15)
assert seg.start == 0
assert seg.peak == 15
assert seg.end == 30
def test_rejects_end_not_greater_than_start(self) -> None:
with pytest.raises(ValidationError, match="end"):
Segment(start=10, end=10, peak=10)
def test_peak_must_be_inside_window(self) -> None:
with pytest.raises(ValidationError, match="peak"):
Segment(start=0, end=30, peak=30) # peak == end is out of range
def test_is_frozen(self) -> None:
seg = Segment(start=0, end=10, peak=5)
with pytest.raises(ValidationError):
seg.start = 1
class TestExtractorSpecs:
def test_joint_pair_distance_rejects_identical_joints(self) -> None:
with pytest.raises(ValidationError, match="distinct"):
JointPairDistanceExtractor(joints=(7, 7))
def test_joint_pair_distance_rejects_negative(self) -> None:
with pytest.raises(ValidationError, match="non-negative"):
JointPairDistanceExtractor(joints=(-1, 5))
def test_joint_angle_rejects_negative(self) -> None:
with pytest.raises(ValidationError, match="non-negative"):
JointAngleExtractor(triplet=(0, -1, 2))
def test_joint_axis_rejects_bad_axis(self) -> None:
with pytest.raises(ValidationError):
JointAxisExtractor(joint=0, axis=3)
def test_discriminator_dispatches_to_correct_variant(self) -> None:
# Round-trip each extractor variant through a SegmentationConfig
# dict to confirm the discriminator selects the right class.
for payload, cls in [
({"kind": "joint_axis", "joint": 1, "axis": 2}, JointAxisExtractor),
({"kind": "joint_pair_distance", "joints": [1, 2]}, JointPairDistanceExtractor),
({"kind": "joint_speed", "joint": 3}, JointSpeedExtractor),
({"kind": "joint_angle", "triplet": [1, 2, 3]}, JointAngleExtractor),
]:
cfg = SegmentationConfig.model_validate({"extractor": payload})
assert isinstance(cfg.extractor, cls)
class TestSegmentationPersistence:
def test_roundtrip_through_video_predictions(
self,
tmp_path: Path,
video_predictions_payload: dict,
) -> None:
cfg = SegmentationConfig(
extractor=JointAxisExtractor(joint=15, axis=1),
min_prominence=20.0,
pad_seconds=0.1,
)
segmentation = Segmentation(
config=cfg,
segments=[Segment(start=0, end=1, peak=0), Segment(start=1, end=2, peak=1)],
)
video_predictions_payload["segmentations"] = {
"cup_lift": segmentation.model_dump(mode="json")
}
vp = VideoPredictions.model_validate(video_predictions_payload)
path = tmp_path / "video.json"
save_video_predictions(path, vp)
loaded = load_video_predictions(path)
assert "cup_lift" in loaded.segmentations
cup = loaded.segmentations["cup_lift"]
assert cup.config.extractor.kind == "joint_axis"
assert len(cup.segments) == 2
assert cup.config.method == "valley_to_valley_v1"
def test_default_empty_segmentations_on_new_instance(
self, video_predictions_payload: dict
) -> None:
vp = VideoPredictions.model_validate(video_predictions_payload)
assert vp.segmentations == {}
def test_legacy_file_without_segmentations_loads_clean(
self,
tmp_path: Path,
video_predictions_payload: dict,
) -> None:
# Older predictions files never wrote the segmentations field;
# make sure they still validate and deserialize as if they had
# an empty mapping.
assert "segmentations" not in video_predictions_payload
path = tmp_path / "legacy.json"
path.write_text(json.dumps(video_predictions_payload))
vp = load_video_predictions(path)
assert vp.segmentations == {}
# ---------------------------------------------------------------------------
# Status file
# ---------------------------------------------------------------------------

18
uv.lock
View File

@ -673,6 +673,7 @@ dependencies = [
{ name = "matplotlib" },
{ name = "numpy" },
{ name = "opencv-python-headless" },
{ name = "psutil" },
{ name = "pydantic" },
{ name = "pydantic-settings" },
{ name = "pyyaml" },
@ -711,6 +712,7 @@ requires-dist = [
{ name = "matplotlib", specifier = ">=3.8" },
{ name = "numpy", specifier = ">=1.26" },
{ name = "opencv-python-headless", specifier = ">=4.9" },
{ name = "psutil", specifier = ">=5.9" },
{ name = "pydantic", specifier = ">=2.6" },
{ name = "pydantic-settings", specifier = ">=2.2" },
{ name = "pyyaml", specifier = ">=6.0" },
@ -949,6 +951,22 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/88/95/608f665226bca68b736b79e457fded9a2a38c4f4379a4a7614303d9db3bc/protobuf-7.34.1-py3-none-any.whl", hash = "sha256:bb3812cd53aefea2b028ef42bd780f5b96407247f20c6ef7c679807e9d188f11", size = 170715, upload-time = "2026-03-20T17:34:45.384Z" },
]
[[package]]
name = "psutil"
version = "7.2.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/aa/c6/d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/psutil-7.2.2.tar.gz", hash = "sha256:0746f5f8d406af344fd547f1c8daa5f5c33dbc293bb8d6a16d80b4bb88f59372", size = 493740, upload-time = "2026-01-28T18:14:54.428Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e7/36/5ee6e05c9bd427237b11b3937ad82bb8ad2752d72c6969314590dd0c2f6e/psutil-7.2.2-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:ed0cace939114f62738d808fdcecd4c869222507e266e574799e9c0faa17d486", size = 129090, upload-time = "2026-01-28T18:15:22.168Z" },
{ url = "https://files.pythonhosted.org/packages/80/c4/f5af4c1ca8c1eeb2e92ccca14ce8effdeec651d5ab6053c589b074eda6e1/psutil-7.2.2-cp36-abi3-macosx_11_0_arm64.whl", hash = "sha256:1a7b04c10f32cc88ab39cbf606e117fd74721c831c98a27dc04578deb0c16979", size = 129859, upload-time = "2026-01-28T18:15:23.795Z" },
{ url = "https://files.pythonhosted.org/packages/b5/70/5d8df3b09e25bce090399cf48e452d25c935ab72dad19406c77f4e828045/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:076a2d2f923fd4821644f5ba89f059523da90dc9014e85f8e45a5774ca5bc6f9", size = 155560, upload-time = "2026-01-28T18:15:25.976Z" },
{ url = "https://files.pythonhosted.org/packages/63/65/37648c0c158dc222aba51c089eb3bdfa238e621674dc42d48706e639204f/psutil-7.2.2-cp36-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b0726cecd84f9474419d67252add4ac0cd9811b04d61123054b9fb6f57df6e9e", size = 156997, upload-time = "2026-01-28T18:15:27.794Z" },
{ url = "https://files.pythonhosted.org/packages/8e/13/125093eadae863ce03c6ffdbae9929430d116a246ef69866dad94da3bfbc/psutil-7.2.2-cp36-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:fd04ef36b4a6d599bbdb225dd1d3f51e00105f6d48a28f006da7f9822f2606d8", size = 148972, upload-time = "2026-01-28T18:15:29.342Z" },
{ url = "https://files.pythonhosted.org/packages/04/78/0acd37ca84ce3ddffaa92ef0f571e073faa6d8ff1f0559ab1272188ea2be/psutil-7.2.2-cp36-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:b58fabe35e80b264a4e3bb23e6b96f9e45a3df7fb7eed419ac0e5947c61e47cc", size = 148266, upload-time = "2026-01-28T18:15:31.597Z" },
{ url = "https://files.pythonhosted.org/packages/b4/90/e2159492b5426be0c1fef7acba807a03511f97c5f86b3caeda6ad92351a7/psutil-7.2.2-cp37-abi3-win_amd64.whl", hash = "sha256:eb7e81434c8d223ec4a219b5fc1c47d0417b12be7ea866e24fb5ad6e84b3d988", size = 137737, upload-time = "2026-01-28T18:15:33.849Z" },
{ url = "https://files.pythonhosted.org/packages/8c/c7/7bb2e321574b10df20cbde462a94e2b71d05f9bbda251ef27d104668306a/psutil-7.2.2-cp37-abi3-win_arm64.whl", hash = "sha256:8c233660f575a5a89e6d4cb65d9f938126312bca76d8fe087b947b3a1aaac9ee", size = 134617, upload-time = "2026-01-28T18:15:36.514Z" },
]
[[package]]
name = "pydantic"
version = "2.13.0"