# Changelog All notable changes to NeuroPose are recorded in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] This section covers the ground-up rewrite of NeuroPose. The entries below describe the difference between the previous internal prototype and the state of the repository at the first tagged release, and will be split into per-release sections once tagging begins. ### Added #### Package structure and tooling - `src/neuropose/` package layout with `py.typed` marker, MIT `LICENSE`, policy-enforcing `.gitignore`, pinned Python 3.11 (`.python-version`), and `pyproject.toml` with full project metadata, classifiers, and URL pointers. The runtime TensorFlow dependency is pinned to `tensorflow>=2.16,<3.0` — see *Changed* below for the rationale. `psutil>=5.9` is a runtime dependency used by the estimator's always-on `PerformanceMetrics` collection to sample peak RSS. - `[project.optional-dependencies].analysis` extra for fastdtw, scipy, scikit-learn, and sktime — install via `pip install neuropose[analysis]`. - `[project.optional-dependencies].metal` extra pulling `tensorflow-metal>=1.2,<2` under `sys_platform == 'darwin' and platform_machine == 'arm64'` environment markers. Opt-in only via `pip install 'neuropose[metal]'` or `uv sync --extra metal`; silently no-op on every non-Apple-Silicon platform. The Metal path is **not** exercised in CI and is documented as experimental in `docs/getting-started.md` — users enabling it are expected to spot-check numerics against the CPU path before trusting results downstream. - `[dependency-groups].dev` (PEP 735) with the full dev + docs + analyzer toolchain: pytest, pytest-cov, ruff, pyright, pre-commit, mkdocs-material, mkdocstrings, fastdtw, and scipy. `uv sync --group dev` gives contributors everything needed to run the whole suite. - `AUTHORS.md`, `CITATION.cff` (with a MeTRAbs upstream `references:` entry), and a MIT-licensed `LICENSE` with an explicit MeTRAbs attribution paragraph. - Pre-commit configuration (`.pre-commit-config.yaml`) running ruff, ruff-format, gitleaks (secret scanning), a 500 KB-limit large-files hook, end-of-file fixers, trailing-whitespace fixers, and YAML/TOML/JSON validators. Pyright is deliberately **not** in pre-commit — it runs in CI only, so pre-commit stays fast. - Ruff configuration in `pyproject.toml` with a deliberately broad rule selection (pycodestyle, pyflakes, isort, bugbear, pyupgrade, simplify, ruff-specific, pep8-naming, comprehensions, pathlib, pytest-style, tidy-imports, numpy-specific, pydocstyle with numpy convention). Per-file ignores for tests and private modules. - Pyright configuration in `standard` mode (not `strict` — TF/OpenCV stubs would otherwise drown the signal). Unknown-type reports are explicitly silenced until the TensorFlow version pin is settled. - Pytest configuration with strict markers, an opt-in `slow` marker, and a `--runslow` CLI flag implemented in `tests/conftest.py::pytest_collection_modifyitems` so integration tests stay out of the default run. #### CI / infrastructure - GitHub Actions workflow `.github/workflows/ci.yml` running three parallel jobs — **lint** (ruff), **typecheck** (pyright), and **test** (pytest) — on every push and PR to `main`. Uses `uv` with a pinned version (`0.9.16`) and cache-enabled setup for fast reruns. Concurrency control cancels superseded runs on the same branch. - GitHub Actions workflow `.github/workflows/docs.yml` that builds the mkdocs-material site on every relevant push and uploads the rendered site as a 14-day workflow artifact. GitHub Pages deployment is intentionally not wired up yet; the workflow header comment describes what to add when the repo flips public. #### Runtime modules - **`neuropose.config`** — `Settings` class built on `pydantic-settings`. Field-level validation for `device`, `poll_interval_seconds`, and `default_fov_degrees`; explicit `from_yaml()` classmethod (no implicit config-file discovery); XDG defaults for `data_dir` and `model_cache_dir` (`~/.local/share/neuropose/…`) so runtime data never lives inside the repository; `ensure_dirs()` as an explicit method so construction remains filesystem-side-effect-free. - **`neuropose.io`** — validated prediction schemas: `FramePrediction` (frozen), `VideoMetadata` (frame count, fps, width, height), `VideoPredictions` (metadata envelope + frames mapping + optional `segmentations` field), `JobResults`, `JobStatus` enum, `JobStatusEntry` (with a structured `error` field), and `StatusFile`. Performance schema: frozen `PerformanceMetrics` carrying per-call timings (`model_load_seconds`, `total_seconds`, `per_frame_latencies_ms`), `peak_rss_mb`, `active_device`, `tensorflow_metal_active`, and `tensorflow_version`; `BenchmarkResult` pairing a discarded `warmup_pass` with `measured_passes` and a `BenchmarkAggregate` (mean / p50 / p95 / p99 per-frame latency, mean throughput, max peak RSS); optional `CpuComparisonResult` nested inside `BenchmarkResult` for `--compare-cpu` runs, carrying both device aggregates, the throughput speedup, and the maximum-element-wise `poses3d` divergence in millimetres. Segmentation schema: frozen `Segment` windows (`start`, `end`, `peak`), `SegmentationConfig` (with a `method` version literal, e.g. `valley_to_valley_v1`), a discriminated `ExtractorSpec` union over `JointAxisExtractor`, `JointPairDistanceExtractor`, `JointSpeedExtractor`, and `JointAngleExtractor`, and `Segmentation` pairing a config with its segments so on-disk results are self-describing. Load and save helpers with an atomic tmp-file-then-rename pattern for every state file. `load_benchmark_result` / `save_benchmark_result` follow the same atomic pattern. `load_status` is deliberately crash-resilient: missing, corrupt, or non-mapping JSON returns an empty `StatusFile` rather than raising. Legacy predictions files without the `segmentations` field deserialize cleanly to an empty mapping. - **`neuropose.estimator`** — `Estimator` class that streams frames directly from OpenCV into the model, with no intermediate write-to- disk-then-read-back-as-PNG round trip. Returns a typed `ProcessVideoResult` containing a validated `VideoPredictions` object and an always-populated `PerformanceMetrics` bundle (per- frame latency in ms, total wall clock, peak RSS via `psutil`, active TF device string, `tensorflow-metal` detection, TF version, and model load time when the caller went through `load_model()`). Does not touch the filesystem. Constructor accepts an injected model for testability; `load_model()` delegates to `neuropose._model.load_metrabs_model()`. Typed exception hierarchy: `EstimatorError`, `ModelNotLoadedError`, `VideoDecodeError`. Optional per-frame `progress` callback for long videos. Frame identifier convention is `frame_000000` (six-digit zero-pad, no extension — no file is implied). - **`neuropose.visualize`** — `visualize_predictions()` for per-frame 2D + 3D overlay rendering. `matplotlib.use("Agg")` is called inside the function rather than at module import, so `import neuropose.visualize` has no global side effect. Explicit deep-copy of `poses3d` before axis rotation to prevent the aliasing bug from the previous prototype. Supports `frame_indices` for rendering a subset of frames. - **`neuropose.interfacer`** — `Interfacer` job-lifecycle daemon with dependency-injected `Settings` and `Estimator`. Single-instance enforcement via `fcntl.flock` on `data_dir/.neuropose.lock`. Crash-recovery `recover_stuck_jobs()` that marks any status entries left in `processing` state as failed with an "interrupted" message and quarantines their inputs. Graceful shutdown on SIGINT/ SIGTERM with an interruptible sleep. Structured error fields on every failed job. `run_once()` factored out of the main loop so tests can drive single iterations without threading. Quarantine collision handling (`job_a.1`, `job_a.2`, …) and empty-directory silent-skip heuristic (mid-copy directories are not marked failed). - **`neuropose._model`** — MeTRAbs model loader. Downloads the pinned tarball from the upstream RWTH Aachen URL (`metrabs_eff2l_y4_384px_800k_28ds.tar.gz`), verifies its SHA-256 checksum, atomically extracts to a staging directory and renames into place, and loads via `tf.saved_model.load`. Streams the download and hash computation in 1 MB chunks so memory is flat. One automatic retry on SHA-256 mismatch (in case the previous download was truncated). Post-load interface check for `detect_poses`, `per_skeleton_joint_names`, and `per_skeleton_joint_edges`. - **`neuropose.benchmark`** — multi-pass inference benchmarking for a single video. `run_benchmark()` runs `process_video` N times (default 5), always discards the first pass as warmup (graph compilation, file-system cache warmup), and aggregates the remaining `PerformanceMetrics` into a `BenchmarkAggregate` with mean / p50 / p95 / p99 per-frame latency, mean throughput, and max peak RSS. `capture_reference=True` additionally preserves the last measured pass's `VideoPredictions` in memory so the `--compare-cpu` CLI flow can diff the `poses3d` arrays between a GPU and CPU run. `compute_poses3d_divergence()` computes the maximum element-wise absolute difference (in millimetres) between two prediction sets, skipping frames with mismatched detection counts and surfacing the `frame_count_compared` so callers can tell if the number is trustworthy. `format_benchmark_report()` renders a human-readable summary for CLI stdout. - **`neuropose.analyzer`** — post-processing subpackage with lazy imports for the heavy dependencies: - `analyzer.dtw` — three DTW entry points (`dtw_all`, `dtw_per_joint`, `dtw_relation`) over fastdtw, with a frozen `DTWResult` dataclass. See `RESEARCH.md` for the ongoing methodology investigation. - `analyzer.features` — `predictions_to_numpy`, `normalize_pose_sequence` (uniform and axis-wise), `pad_sequences` (edge-padding), `extract_joint_angles` (NaN on degenerate vectors), `extract_feature_statistics` (`FeatureStatistics` frozen dataclass), and a `find_peaks` thin wrapper around `scipy.signal.find_peaks`. - `analyzer.segment` — repetition segmentation for trials in which a subject performs the same movement several times. A three-layer API: `segment_by_peaks` (pure 1D valley-to-valley peak detection on a generic signal), `segment_predictions` (top-level entry point taking a `VideoPredictions` plus an `ExtractorSpec`, converting time-based parameters to frame counts via `metadata.fps`), and `slice_predictions` (split a `VideoPredictions` into one per detected repetition with re-keyed frame names and a rewritten `frame_count`). Ships four extractor factories — `joint_axis`, `joint_pair_distance`, `joint_speed`, and `joint_angle` — plus a `JOINT_NAMES` constant for the berkeley_mhad_43 skeleton with a `joint_index(name)` lookup, so post-processing callers can resolve `"rwri"` → integer without loading the MeTRAbs SavedModel. A matching integration test (`tests/integration/test_joint_names_drift.py`, marked `slow`) loads the real model and asserts the constant still matches, so any upstream skeleton drift fails CI. - **`neuropose.cli`** — Typer-based command-line interface with five subcommands: `watch` (run the daemon), `process