# Changelog All notable changes to NeuroPose are recorded in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] This section covers the ground-up rewrite of NeuroPose. The entries below describe the difference between the previous internal prototype and the state of the repository at the first tagged release, and will be split into per-release sections once tagging begins. ### Added #### Package structure and tooling - `src/neuropose/` package layout with `py.typed` marker, MIT `LICENSE`, policy-enforcing `.gitignore`, pinned Python 3.11 (`.python-version`), and `pyproject.toml` with full project metadata, classifiers, and URL pointers. The runtime TensorFlow dependency is pinned to `tensorflow>=2.16,<2.19` — see *Changed* below for the rationale. `psutil>=5.9` is a runtime dependency used by the estimator's always-on `PerformanceMetrics` collection to sample peak RSS. - `[project.optional-dependencies].analysis` extra for fastdtw, scipy, scikit-learn, and sktime — install via `pip install neuropose[analysis]`. - `[project.optional-dependencies].metal` extra pulling `tensorflow-metal>=1.2,<2` under `sys_platform == 'darwin' and platform_machine == 'arm64'` environment markers. Opt-in only via `pip install 'neuropose[metal]'` or `uv sync --extra metal`; silently no-op on every non-Apple-Silicon platform. The Metal path is **not** exercised in CI and is documented as experimental in `docs/getting-started.md` — users enabling it are expected to spot-check numerics against the CPU path before trusting results downstream. - `[dependency-groups].dev` (PEP 735) with the full dev + docs + analyzer toolchain: pytest, pytest-cov, ruff, pyright, pre-commit, mkdocs-material, mkdocstrings, fastdtw, and scipy. `uv sync --group dev` gives contributors everything needed to run the whole suite. - `AUTHORS.md`, `CITATION.cff` (with a MeTRAbs upstream `references:` entry), and a MIT-licensed `LICENSE` with an explicit MeTRAbs attribution paragraph. - Pre-commit configuration (`.pre-commit-config.yaml`) running ruff, ruff-format, gitleaks (secret scanning), a 500 KB-limit large-files hook, end-of-file fixers, trailing-whitespace fixers, and YAML/TOML/JSON validators. Pyright is deliberately **not** in pre-commit — it runs in CI only, so pre-commit stays fast. - Ruff configuration in `pyproject.toml` with a deliberately broad rule selection (pycodestyle, pyflakes, isort, bugbear, pyupgrade, simplify, ruff-specific, pep8-naming, comprehensions, pathlib, pytest-style, tidy-imports, numpy-specific, pydocstyle with numpy convention). Per-file ignores for tests and private modules. - Pyright configuration in `standard` mode (not `strict` — TF/OpenCV stubs would otherwise drown the signal). Unknown-type reports are explicitly silenced until the TensorFlow version pin is settled. - Pytest configuration with strict markers, an opt-in `slow` marker, and a `--runslow` CLI flag implemented in `tests/conftest.py::pytest_collection_modifyitems` so integration tests stay out of the default run. #### CI / infrastructure - GitHub Actions workflow `.github/workflows/ci.yml` running three parallel jobs — **lint** (ruff), **typecheck** (pyright), and **test** (pytest) — on every push and PR to `main`. Uses `uv` with a pinned version (`0.9.16`) and cache-enabled setup for fast reruns. Concurrency control cancels superseded runs on the same branch. - GitHub Actions workflow `.github/workflows/docs.yml` that builds the mkdocs-material site on every relevant push and uploads the rendered site as a 14-day workflow artifact. GitHub Pages deployment is intentionally not wired up yet; the workflow header comment describes what to add when the repo flips public. #### Runtime modules - **`neuropose.config`** — `Settings` class built on `pydantic-settings`. Field-level validation for `device`, `poll_interval_seconds`, and `default_fov_degrees`; explicit `from_yaml()` classmethod (no implicit config-file discovery); XDG defaults for `data_dir` and `model_cache_dir` (`~/.local/share/neuropose/…`) so runtime data never lives inside the repository; `ensure_dirs()` as an explicit method so construction remains filesystem-side-effect-free. - **`neuropose.io`** — validated prediction schemas: `FramePrediction` (frozen), `VideoMetadata` (frame count, fps, width, height), `VideoPredictions` (metadata envelope + frames mapping + optional `segmentations` field), `JobResults`, `JobStatus` enum, `JobStatusEntry` (with a structured `error` field plus optional live-progress fields — `current_video`, `frames_processed`, `frames_total`, `videos_completed`, `videos_total`, `percent_complete`, `last_update` — populated by the interfacer during inference and consumed by `neuropose.monitor`), and `StatusFile`. Legacy status files written before the progress fields existed still load cleanly because every new field is optional with a `None` default. Performance schema: frozen `PerformanceMetrics` carrying per-call timings (`model_load_seconds`, `total_seconds`, `per_frame_latencies_ms`), `peak_rss_mb`, `active_device`, `tensorflow_metal_active`, and `tensorflow_version`; `BenchmarkResult` pairing a discarded `warmup_pass` with `measured_passes` and a `BenchmarkAggregate` (mean / p50 / p95 / p99 per-frame latency, mean throughput, max peak RSS); optional `CpuComparisonResult` nested inside `BenchmarkResult` for `--compare-cpu` runs, carrying both device aggregates, the throughput speedup, and the maximum-element-wise `poses3d` divergence in millimetres. Segmentation schema: frozen `Segment` windows (`start`, `end`, `peak`), `SegmentationConfig` (with a `method` version literal, e.g. `valley_to_valley_v1`), a discriminated `ExtractorSpec` union over `JointAxisExtractor`, `JointPairDistanceExtractor`, `JointSpeedExtractor`, and `JointAngleExtractor`, and `Segmentation` pairing a config with its segments so on-disk results are self-describing. Load and save helpers with an atomic tmp-file-then-rename pattern for every state file. `load_benchmark_result` / `save_benchmark_result` follow the same atomic pattern. `load_status` is deliberately crash-resilient: missing, corrupt, or non-mapping JSON returns an empty `StatusFile` rather than raising. Legacy predictions files without the `segmentations` field deserialize cleanly to an empty mapping. - **`neuropose.estimator`** — `Estimator` class that streams frames directly from OpenCV into the model, with no intermediate write-to- disk-then-read-back-as-PNG round trip. Returns a typed `ProcessVideoResult` containing a validated `VideoPredictions` object and an always-populated `PerformanceMetrics` bundle (per- frame latency in ms, total wall clock, peak RSS via `psutil`, active TF device string, `tensorflow-metal` detection, TF version, and model load time when the caller went through `load_model()`). Does not touch the filesystem. Constructor accepts an injected model for testability; `load_model()` delegates to `neuropose._model.load_metrabs_model()`. Typed exception hierarchy: `EstimatorError`, `ModelNotLoadedError`, `VideoDecodeError`. Optional per-frame `progress` callback for long videos. Frame identifier convention is `frame_000000` (six-digit zero-pad, no extension — no file is implied). - **`neuropose.visualize`** — `visualize_predictions()` for per-frame 2D + 3D overlay rendering. `matplotlib.use("Agg")` is called inside the function rather than at module import, so `import neuropose.visualize` has no global side effect. Explicit deep-copy of `poses3d` before axis rotation to prevent the aliasing bug from the previous prototype. Supports `frame_indices` for rendering a subset of frames. - **`neuropose.interfacer`** — `Interfacer` job-lifecycle daemon with dependency-injected `Settings` and `Estimator`. Single-instance enforcement via `fcntl.flock` on `data_dir/.neuropose.lock`. Crash-recovery `recover_stuck_jobs()` that marks any status entries left in `processing` state as failed with an "interrupted" message and quarantines their inputs. Graceful shutdown on SIGINT/ SIGTERM with an interruptible sleep. Structured error fields on every failed job. `run_once()` factored out of the main loop so tests can drive single iterations without threading. Quarantine collision handling (`job_a.1`, `job_a.2`, …) and empty-directory silent-skip heuristic (mid-copy directories are not marked failed). - **`neuropose._model`** — MeTRAbs model loader. Downloads the pinned tarball from the upstream RWTH Aachen URL (`metrabs_eff2l_y4_384px_800k_28ds.tar.gz`), verifies its SHA-256 checksum, atomically extracts to a staging directory and renames into place, and loads via `tf.saved_model.load`. Streams the download and hash computation in 1 MB chunks so memory is flat. One automatic retry on SHA-256 mismatch (in case the previous download was truncated). Post-load interface check for `detect_poses`, `per_skeleton_joint_names`, and `per_skeleton_joint_edges`. - **`neuropose.monitor`** — localhost HTTP status dashboard. A small `http.server`-based HTTP server (pure stdlib, zero new runtime dependencies) that serves a plain HTML page at `GET /` with an auto-refresh meta tag, one row per tracked job, a `` bar, and a stale-entry warning badge for `processing` jobs whose `last_update` has not ticked in 60 s. `GET /status.json` returns the raw validated `StatusFile` as JSON for `curl`/scripted pipelines; `?job=` filters to a single entry. `GET /health` is a simple liveness probe. Binds to `127.0.0.1:8765` by default — loopback-only, with an explicit `--host` override required to expose externally. Every request re-reads `status.json`, so the monitor has no in-memory cache, no sync protocol with the daemon, and stays useful even if the daemon is down (last-known state surfaced with the stale badge). - **Progress checkpointing in the interfacer.** `Interfacer` now updates the currently-running job's `JobStatusEntry` every `settings.status_checkpoint_every_frames` frames (default 30, a new `Settings` field) during inference via the estimator's `progress` callback. Each checkpoint rewrites `status.json` atomically through the existing `save_status` helper; writes are best-effort and I/O failures are logged without interrupting inference. `_run_job_inner` seeds a "videos_total=N" checkpoint before calling the estimator so the monitor shows the job's scope from the first poll. Checkpoint cadence is knob-exposed for operators who want to tune the smoothness-vs-write-rate trade-off. - **`neuropose.ingest`** — zip-archive intake utility. `ingest_zip()` extracts a zip of videos into one job directory per video under `$data_dir/in/`, with validation-before-write (path-traversal and absolute-path members rejected, oversize archives rejected at the 20 GB-uncompressed cap), zip-internal and external collision detection reported in one shot, non-video members silently skipped (`.DS_Store`, `README.md`, etc.), and per-job atomic placement via a staging directory + `os.rename`. Nested paths are flattened into job names by joining components with underscores and sanitising unsafe characters — `patient_001/trial_01.mp4` becomes job `patient_001_trial_01`, preserving disambiguation against a sibling `patient_002/trial_01.mp4`. Typed exception hierarchy: `IngestError`, `ArchiveInvalidError`, `ArchiveEmptyError`, `ArchiveTooLargeError`, `JobCollisionError` (with a `.collisions` list of offending names). The running daemon needs no changes — ingested job dirs are picked up on the next poll. - **`neuropose.migrations`** — schema-migration infrastructure for the three top-level serialised payloads (`VideoPredictions`, `JobResults`, `BenchmarkResult`). Every payload carries a `schema_version` field defaulting to `CURRENT_VERSION`; on load, the raw JSON dict is passed through `migrate_video_predictions` / `migrate_job_results` / `migrate_benchmark_result` *before* pydantic validation so files written by older NeuroPose versions upgrade transparently. One shared `CURRENT_VERSION` counter; per-schema migration registries populated via `register_video_predictions_migration(from_version)` and `register_benchmark_result_migration(from_version)` decorators. `JobResults` is a `RootModel` with no envelope of its own, so its migration runs per-entry across the root mapping. The driver raises `FutureSchemaError` for payloads newer than the current build (clear upgrade-NeuroPose message), `MigrationNotFoundError` for missing chain links (indicates a `CURRENT_VERSION` bump that forgot its migration), and logs at INFO on each version advance. Currently at `CURRENT_VERSION = 2`, with registered v1 → v2 migrations for `VideoPredictions` and `BenchmarkResult` that add the optional `provenance` field. - **`neuropose.analyzer.features.procrustes_align`** — Kabsch rigid-alignment helper for pose sequences, plus a `ProcrustesMode` literal (`"per_frame"` | `"per_sequence"`) and a frozen `AlignmentDiagnostics` dataclass (`rotation_deg`, `rotation_deg_max`, `translation`, `translation_max`, `scale`, plus the mode that produced them). Per-sequence mode fits one rigid transform across the whole trial; per-frame fits an independent transform per frame. Optional `scale=True` fits a uniform scale factor for cross-subject comparisons. Wired into every DTW entry point in `neuropose.analyzer.dtw` via a new keyword-only `align: AlignMode = "none"` parameter — `"none"` preserves the 0.1 raw-coordinate behaviour, while `"procrustes_per_frame"` and `"procrustes_per_sequence"` route inputs through `procrustes_align` before DTW runs so the returned distance is rotation- and translation-invariant. Paper C's pipeline is expected to set `align="procrustes_per_sequence"`; see `TECHNICAL.md` Phase 0. - **`neuropose.analyzer.dtw.Representation`** and **`neuropose.analyzer.dtw.NanPolicy`** — two new Literal types exposing orthogonal DTW preprocessing knobs on every entry point. `representation` (on `dtw_all` and `dtw_per_joint`) switches the per-frame feature vector between `"coords"` (the 0.1 default) and `"angles"`, which runs `extract_joint_angles` on the supplied `angle_triplets` first — yielding distances that are translation-, rotation-, and scale-invariant by construction, and directly interpretable in clinical terms. `nan_policy` (on all three entry points) selects `"propagate"` (surface fastdtw's ValueError on NaN — the default), `"interpolate"` (linear fill per feature column), or `"drop"` (remove NaN frames before DTW); the policy is applied consistently whether NaN originated from the angles pipeline or from corrupted upstream coordinates. `dtw_relation` stays a standalone convenience entry point for two-joint displacement DTW; users who prefer a unified API can express the same computation via `dtw_all` with an appropriate pair of angle triplets or run `dtw_relation` directly. - **`neuropose.analyzer.pipeline`** (schemas) — declarative analysis-pipeline configuration and output artifact, parseable from YAML or JSON via pydantic. `AnalysisConfig` captures a full experiment: inputs (primary + optional reference predictions files), preprocessing (person index, with room to grow), optional segmentation (`gait_cycles` / `gait_cycles_bilateral` / `extractor` discriminated union), and a required analysis stage (`dtw` / `stats` / `none` discriminated union). `AnalysisReport` is the runtime output: carries the originating config, a `Provenance` envelope with `analysis_config` populated, per-input summaries, produced segmentations, and an analysis-result payload that mirrors the stage choice (`DtwResults`, `StatsResults`, or `NoResults`). Cross-field invariants — `method="dtw_relation"` requires `joint_i`/`joint_j`, `representation="angles"` requires non-empty `angle_triplets`, `analysis.kind="dtw"` requires `inputs.reference`, `analysis.kind="stats"` refuses a reference — are enforced at parse time via `model_validator` so typos fail in milliseconds instead of after a multi-minute predictions load. `AnalysisReport` carries a `schema_version` field defaulting to `CURRENT_VERSION = 2`, with a new `register_analysis_report_migration` decorator and `migrate_analysis_report` driver in `neuropose.migrations` ready for future schema changes. `run_analysis(config)` loads the named predictions files, applies the configured segmentation, dispatches to the selected analysis kind (DTW, stats, or none), and emits a fully populated `AnalysisReport` whose `Provenance` inherits the inference-time envelope from the primary input with `analysis_config` stamped in, so the report is self-describing even if the source YAML is lost. For DTW runs with segmentation, segments are paired one-to-one by index across primary and reference, truncating to `min(len_primary, len_reference)`; bilateral segmentations emit per-side distances under `"left_heel_strikes[i]"` / `"right_heel_strikes[i]"` labels. `load_config(path)` parses YAML, `save_report(path, report)` writes atomically, and `load_report(path)` rehydrates via the migration chain. Wired to the CLI as `neuropose analyze --config [--output ]` — replaces the placeholder stub that previously returned `EXIT_PENDING`. The CLI surfaces schema violations and YAML parse errors as `EXIT_USAGE=2` with a clear message pointing at the offending file, prints a one-line summary of the run (segmentation counts, analysis kind, per-segment distance count + mean for DTW), and supports `--output`/`-o` to override the report path declared in the config (useful for sweeping a single config over multiple input pairs from a shell loop). Ships three example configs under `examples/analysis/`: `minimal.yaml` (smallest working DTW pipeline), `paper_c_headline.yaml` (representative Paper C config with bilateral gait-cycle segmentation, per-sequence Procrustes, and joint-angle DTW on knee/hip triplets), and `per_joint_debug.yaml` (per-joint DTW breakdown for diagnosing which joint drives an unexpected distance). An integration suite exercises each example against synthetic predictions so schema drift between the YAMLs and the executor fails CI, not silently at run time. Documented in `docs/api/pipeline.md`. - **`neuropose.analyzer.segment.segment_gait_cycles`** and **`segment_gait_cycles_bilateral`** — clinical convenience wrappers over `segment_predictions` that pre-fill a `joint_axis` extractor with gait-appropriate defaults (`joint="rhee"`, `axis="y"`, `min_cycle_seconds=0.4`). The single-side entry point accepts any berkeley_mhad_43 joint name and any spatial axis as a string literal `"x" | "y" | "z"`, plus an `invert` flag for recordings whose vertical axis runs opposite to MeTRAbs's Y-down world-coordinate convention. The bilateral wrapper runs the detection on both `lhee` and `rhee` and returns the two results under `"left_heel_strikes"` / `"right_heel_strikes"` keys — shape-compatible with `VideoPredictions.segmentations` so the dict can be merged in directly. Degrades gracefully on pathological gaits (shuffling, walker-assisted) by returning an empty segments list rather than raising. Closes the gait-cycle segmentation item in `TECHNICAL.md` Phase 0. - **`neuropose.io.Provenance`** — reproducibility envelope for every inference run. Populated automatically by `Estimator.process_video` when the model was loaded via `load_model` (the production path) and attached to the output `VideoPredictions`; propagates from there into `JobResults` (per-video) and `BenchmarkResult` (via the benchmark loop). Captures the MeTRAbs artifact SHA-256 and filename, `tensorflow` / `tensorflow-metal` / `numpy` / `neuropose` / Python versions, and reserved slots for a `seed`, `deterministic` flag (Track 2), and `analysis_config` (Phase 0 YAML pipeline). `None` on the injected-model test path where NeuroPose has no way to fingerprint the supplied artifact. Frozen pydantic model with `extra="forbid"` and `protected_namespaces=()` so the `model_*` field names do not collide with pydantic v2's internal namespace. `_model.load_metrabs_model` now returns a `LoadedModel` dataclass bundling the TF handle with the pinned SHA and filename so the estimator can build the `Provenance` without re-hashing the tarball. - **`neuropose.reset`** — pipeline-wide reset utility for the benchmark / iteration loop. `find_neuropose_processes()` scans the OS process table (via `psutil`) for running `neuropose watch` and `neuropose serve` instances and classifies each as `daemon` or `monitor`. `terminate_processes()` SIGINTs them, polls for graceful exit up to a configurable grace period, and optionally escalates to SIGKILL with `force_kill=True`. `wipe_state()` removes the contents of `$data_dir/in/`, `$data_dir/out/` (including `status.json`), `$data_dir/failed/` (unless `keep_failed=True`), the `.neuropose.lock` file, and any leftover `.ingest_/` staging dirs from interrupted ingests; container directories themselves are preserved so the daemon does not need to recreate them on next startup. `reset_pipeline()` composes the three with one safety guard: if any process survives termination, the wipe phase is skipped and the returned `ResetReport` flags `wipe_skipped_due_to_survivors`, because removing `$data_dir` out from under an active daemon would corrupt its in-flight writes. Surfaced as `neuropose reset` in the CLI with `--yes/-y`, `--keep-failed`, `--force-kill`, `--grace-seconds`, and `--dry-run/-n` flags; the command always prints a preview before prompting (skipped under `--yes`) and returns `EXIT_USAGE=2` when survivors block the wipe. - **`neuropose.benchmark`** — multi-pass inference benchmarking for a single video. `run_benchmark()` runs `process_video` N times (default 5), always discards the first pass as warmup (graph compilation, file-system cache warmup), and aggregates the remaining `PerformanceMetrics` into a `BenchmarkAggregate` with mean / p50 / p95 / p99 per-frame latency, mean throughput, and max peak RSS. `capture_reference=True` additionally preserves the last measured pass's `VideoPredictions` in memory so the `--compare-cpu` CLI flow can diff the `poses3d` arrays between a GPU and CPU run. `compute_poses3d_divergence()` computes the maximum element-wise absolute difference (in millimetres) between two prediction sets, skipping frames with mismatched detection counts and surfacing the `frame_count_compared` so callers can tell if the number is trustworthy. `format_benchmark_report()` renders a human-readable summary for CLI stdout. - **`neuropose.analyzer`** — post-processing subpackage with lazy imports for the heavy dependencies: - `analyzer.dtw` — three DTW entry points (`dtw_all`, `dtw_per_joint`, `dtw_relation`) over fastdtw, with a frozen `DTWResult` dataclass and three orthogonal preprocessing knobs (`align`, `representation`, `nan_policy`). See `RESEARCH.md` for the ongoing methodology investigation. - `analyzer.features` — `predictions_to_numpy`, `normalize_pose_sequence` (uniform and axis-wise), `pad_sequences` (edge-padding), `procrustes_align` (Kabsch rigid alignment, per-frame or per-sequence, optional uniform scaling), `extract_joint_angles` (NaN on degenerate vectors), `extract_feature_statistics` (`FeatureStatistics` frozen dataclass), and a `find_peaks` thin wrapper around `scipy.signal.find_peaks`. - `analyzer.segment` — repetition segmentation for trials in which a subject performs the same movement several times. A three-layer API: `segment_by_peaks` (pure 1D valley-to-valley peak detection on a generic signal), `segment_predictions` (top-level entry point taking a `VideoPredictions` plus an `ExtractorSpec`, converting time-based parameters to frame counts via `metadata.fps`), and `slice_predictions` (split a `VideoPredictions` into one per detected repetition with re-keyed frame names and a rewritten `frame_count`). Gait-specific convenience wrappers `segment_gait_cycles` (single heel) and `segment_gait_cycles_bilateral` (both heels, returning a dict keyed by `"left_heel_strikes"` / `"right_heel_strikes"`) sit above `segment_predictions` with clinical defaults. Ships four extractor factories — `joint_axis`, `joint_pair_distance`, `joint_speed`, and `joint_angle` — plus a `JOINT_NAMES` constant for the berkeley_mhad_43 skeleton with a `joint_index(name)` lookup, so post-processing callers can resolve `"rwri"` → integer without loading the MeTRAbs SavedModel. A matching integration test (`tests/integration/test_joint_names_drift.py`, marked `slow`) loads the real model and asserts the constant still matches, so any upstream skeleton drift fails CI. - **`neuropose.cli`** — Typer-based command-line interface with eight subcommands: `watch` (run the daemon), `process