neuropose/docs/architecture.md

315 lines
15 KiB
Markdown

# Architecture
This page describes how NeuroPose is structured and why. It is the
document to read if you are about to modify the estimator, the daemon,
or the output schema, and want to understand the constraints the
existing design is trying to honour.
## Component overview
NeuroPose is a three-stage pipeline:
```text
┌───────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ interfacer │ │ estimator │ │ analyzer │
│ (daemon) │────▶│ (inference) │────▶│ (post-process) │
│ │ │ │ │ │
│ watches filesystem│ │ MeTRAbs wrapper │ │ DTW, features, │
│ manages job state │ │ per-video worker │ │ classification │
└───────────────────┘ └──────────────────┘ └───────────────────┘
│ │ │
▼ ▼ ▼
status.json + VideoPredictions analysis results
job directories (validated schema) (pending commit 10)
```
Each stage is a separate module with one job, and the contracts between
them are defined by validated pydantic schemas in
[`neuropose.io`](api/io.md).
### estimator
**Role:** pure inference library. Given a video path and a MeTRAbs
model, produces a validated `VideoPredictions` object plus a
`PerformanceMetrics` bundle.
**Does NOT handle:** job directories, status files, polling, locking,
signal handling, visualization, or anywhere-to-save decisions. It is a
library, not a daemon.
The estimator streams frames directly from OpenCV into the model — no
intermediate write-to-disk-then-read-back-as-PNG round trip like the
previous prototype had. `process_video()` returns a typed
`ProcessVideoResult` containing the predictions and an
always-populated `PerformanceMetrics` (per-frame latency, peak RSS,
total wall clock, active TF device, TF version, `tensorflow-metal`
detection, and model load time when the caller went through
`load_model()`). It does not touch the filesystem unless the caller
explicitly asks it to save the result.
See [`neuropose.estimator`](api/estimator.md) for the API reference.
### benchmark
**Role:** multi-pass inference benchmarking layered on top of the
estimator. `run_benchmark()` calls `process_video` N times, discards
the first pass as warmup, and aggregates the remaining
`PerformanceMetrics` into a `BenchmarkAggregate` with distributional
statistics (mean / p50 / p95 / p99 per-frame latency, mean
throughput, max peak RSS).
The benchmark is exposed via the `neuropose benchmark <video>` CLI
subcommand. Its `--compare-cpu` flag spawns a subprocess with GPU
visibility hidden (via `tf.config.set_visible_devices([], "GPU")`
before any TF op) so a Metal-backed Apple Silicon run can be diffed
against a CPU baseline — both the throughput speedup and the maximum
element-wise `poses3d` divergence in millimetres are surfaced in the
output. This is the "is `tensorflow-metal` producing correct
numerics?" check that `RESEARCH.md`'s TensorFlow-version-compatibility
section leaves open for v0.1.
### monitor
**Role:** localhost HTTP dashboard. Reads `$data_dir/out/status.json`
on every request and serves a small HTML page (with an auto-refresh
`<meta>` tag, per-job progress bars, and stale-entry warnings) plus
the raw `StatusFile` as JSON at `/status.json`. Runs as a separate
process from the daemon — operators start it with `neuropose serve`,
it is safe to run alongside `neuropose watch`, and it stays useful
even if the daemon is down (last-known state is shown with the stale
badge flagging any lingering `processing` entries).
Design choices:
- **Pure stdlib.** The server is built on `http.server` with a small
request-handler subclass. No FastAPI, no Flask — this is a localhost
tool and the cost of a framework is not justified.
- **Loopback by default.** Binds to `127.0.0.1:8765` with an explicit
`--host` flag to override. Collaborators on the same machine reach
it directly; collaborators elsewhere should go through an SSH
tunnel or explicitly configured reverse proxy. Binding to
`0.0.0.0` is a real network-exposure decision the operator should
make with eyes open.
- **No cache.** Every request re-reads `status.json`, which is tiny
and already written atomically by the daemon, so no sync protocol
is needed between the two processes.
- **Two surfaces, same data.** `GET /` renders HTML for browsers;
`GET /status.json` returns the raw `StatusFile` for `curl` /
scripted pipelines. `?job=<name>` filters to a single entry for
programmatic consumers that only care about one job.
- **Live progress data comes from the interfacer**, not from the
monitor. The interfacer checkpoints the currently-running job's
`JobStatusEntry` every
`settings.status_checkpoint_every_frames` frames (default 30) via
the estimator's `progress` callback, updating `frames_processed`,
`percent_complete`, and `last_update` on the in-memory status file
and calling `save_status`. The monitor then reads those fields and
renders the progress bar — no separate live channel, no in-memory
cache on the monitor side.
### ingest
**Role:** bulk-intake utility. Accepts a zip archive of videos and
produces one job directory per video under `$data_dir/in/`, ready for
the `interfacer` daemon to pick up on its next poll.
The ingester is a **pure library call** (`ingest_zip`) plus a thin
CLI wrapper (`neuropose ingest`). It does not run inference itself —
it only stages files so the existing watch-directory pipeline does
the work. Key guarantees:
- **Validate-before-write.** Path-traversal members, zip bombs, and
empty archives are rejected before any file lands on disk, so a
failed ingest leaves the operator with a clean state.
- **Transactional placement.** Each video is extracted to a staging
directory under `$data_dir/.ingest_<uuid>/`, and only then
atomically renamed into `$data_dir/in/<job_name>/`. The daemon
never sees a half-populated job directory.
- **Collision detection is up-front and exhaustive.** Zip-internal
collisions (two videos that flatten to the same job name) and
external collisions (a job directory already exists) are reported
as a single error listing every offending name. `--force`
deletes-and-replaces; without it, nothing is written.
- **Flattening preserves disambiguation.** The in-archive path is
joined with underscores into the job name — `patient_001/trial_01.mp4`
→ job `patient_001_trial_01` — so nested organisation survives the
flattening without collapsing into silent collisions.
The set of accepted extensions comes from
`neuropose.interfacer.VIDEO_EXTENSIONS`, so any format the daemon
can already process is a valid ingest target.
### interfacer
**Role:** job-lifecycle daemon. Watches `input_dir` for new job
subdirectories, dispatches each to an injected `Estimator`, and manages
the persistent `status.json` that tracks every job's lifecycle.
**Owns:** the `input_dir → output_dir → failed_dir` transitions, the
single-instance lock, signal handling, and crash recovery.
**Does NOT handle:** inference — that is the estimator's job, which is
injected via the constructor so tests can supply a fake.
Key guarantees:
- **Single instance.** An exclusive `fcntl.flock` on
`data_dir/.neuropose.lock` blocks a second daemon from running against
the same data directory. The lock is released automatically on
process exit, even SIGKILL.
- **Crash recovery.** On startup, any status entries left in
`processing` state are marked failed with an "interrupted" error and
their inputs quarantined. The operator decides whether to retry by
moving them back to `input_dir`.
- **Graceful shutdown.** SIGINT and SIGTERM request an orderly stop.
The current job finishes before the loop exits.
- **Structured errors.** Every failed job records a short
`"<ExceptionType>: <message>"` in its status entry so operators have
a grep target without digging through logs.
See [`neuropose.interfacer`](api/interfacer.md) for the API reference.
### analyzer
**Role:** post-processing. Takes a `results.json` and produces analysis
output (DTW comparisons, joint-angle features, repetition segmentation,
classification). Each piece is a pure function of the predictions, so
the module is a set of testable utilities rather than a daemon.
Three submodules ship today:
- `analyzer.features``predictions_to_numpy`, normalization,
padding, joint angles, summary statistics, and a thin
`scipy.signal.find_peaks` wrapper.
- `analyzer.dtw` — three DTW entry points (`dtw_all`, `dtw_per_joint`,
`dtw_relation`) over `fastdtw`, with a frozen `DTWResult` dataclass.
See `RESEARCH.md` for the ongoing methodology discussion.
- `analyzer.segment`**repetition segmentation**. Given a
`VideoPredictions` of a trial in which the subject performs the
same movement several times (e.g. lifting a cup repeatedly), the
module detects the individual repetitions as
`[start, peak, end)` windows via valley-to-valley peak detection
on a clinically chosen 1D signal. The signal is one of four
extractor variants (`joint_axis`, `joint_pair_distance`,
`joint_speed`, `joint_angle`), and the produced `Segmentation`
carries its own `SegmentationConfig` so the on-disk
representation is self-describing. Segmentation is exposed both
as a Python API and as the `neuropose segment` CLI subcommand,
which runs post-hoc against an existing `results.json` — the
daemon stays a pure inference daemon.
Classification wrappers on top of `sktime` are deliberately not
shipped yet; see `RESEARCH.md` for the plan.
## Data flow
```text
┌──────────────────────────┐
│ $XDG_DATA_HOME/neuropose/│
└──────────────────────────┘
┌─────────────┼─────────────┐
▼ ▼ ▼
jobs/in/ jobs/out/ jobs/failed/
│ ▲ ▲
│ discovered │ on success │ on failure
│ │ │
└─────────▶ process_job ────┘
status.json (atomic)
```
1. The operator drops a video (or several) into
`data_dir/in/<job_name>/`.
2. The daemon detects the new job directory on its next poll.
3. For each video in the job, the estimator runs inference and returns
a `VideoPredictions` object.
4. The daemon aggregates per-video predictions into a `JobResults`
object and writes it to `data_dir/out/<job_name>/results.json`.
5. The status entry is updated to `completed`, with the path to
`results.json` recorded.
6. On catastrophic failure (no videos, decode error, model crash), the
job's input directory is moved to `data_dir/failed/<job_name>/` and
the status entry is updated to `failed` with an error message.
All filesystem writes that affect application state (status file, job
results) go through atomic tmp-file-then-rename helpers in
[`neuropose.io`](api/io.md), so a crash mid-write cannot leave a
truncated file behind.
## Runtime directory layout
The daemon operates within a single base `data_dir`:
```text
$data_dir/
├── .neuropose.lock # fcntl lock file; contains owner PID
├── in/
│ ├── job_001/ # operator-created
│ │ ├── video_01.mp4
│ │ └── video_02.mp4
│ └── job_002/
│ └── trial.mov
├── out/
│ ├── status.json # persistent lifecycle state
│ ├── job_001/
│ │ └── results.json # aggregated JobResults
│ └── job_002/
│ └── results.json
└── failed/
└── job_003/ # quarantined inputs
└── broken_video.mov
```
`data_dir` defaults to `$XDG_DATA_HOME/neuropose/jobs` and **is never
inside the repository.** This is deliberate: the previous prototype kept
job directories under `backend/neuropose/in/`, which is exactly how
subject-identifying data ended up on the same tree as `git add`. The
current design makes it mechanically difficult for subject data to
leak into source control.
Model weights are cached separately at `$XDG_DATA_HOME/neuropose/models/`.
## Design principles
A few choices run through every module and are worth knowing if you
plan to extend the package:
**Immutable schemas.** `FramePrediction` and `VideoMetadata` are
frozen pydantic models. The previous prototype had a bug where its
visualizer mutated `poses3d` in place via a numpy view, invisibly
corrupting the data if you visualized before saving. The frozen schema
makes that class of bug impossible.
**Validate at the boundary.** Every load/save helper in `neuropose.io`
validates on entry. Malformed files fail at load time with a pydantic
validation error, not three call sites later as an `AttributeError` on
a missing key.
**Library / daemon separation.** The estimator is pure library — give
it a video and a model, get back validated predictions. The daemon is
the wrapper that adds filesystem semantics. This makes the estimator
trivially testable (inject a fake model, inject any video) and lets
downstream users embed it in other pipelines without inheriting the
daemon's lifecycle.
**Dependency injection.** The `Interfacer` takes its `Estimator` as a
constructor argument. Tests inject fakes; production wires the real
thing. There is no singleton model state.
**No implicit config discovery.** Configuration is loaded explicitly
via `--config` or environment variables. The previous prototype's
`load_config('config.yaml')` was a relative path footgun — it worked
only when the daemon was launched from a specific directory. The new
`Settings` class refuses to guess.
**Atomic writes for all stateful files.** Status file, job results,
predictions — every write goes through a tmp-file-then-rename so a
crash mid-write cannot corrupt state.
**Fail fast, fail specifically.** Each module defines a small hierarchy
of typed exceptions (`EstimatorError`, `InterfacerError`, etc.).
Exception types carry semantic meaning; callers can distinguish
recoverable failures from programmer errors.