4.7 KiB

Raw Blame History

Deployment

This page covers running NeuroPose in production — on a research server, in a container, or as a managed system service. The target audience is whoever is actually setting up the pipeline for a study.

!!! warning "Data handling policy" Before deploying NeuroPose against subject data, read the (pending) docs/data-policy.md — it describes the IRB constraints on retention, sharing, and derived-data handling. If you are reading this before the data policy has landed, pause and ask the project lead before proceeding.

Choosing a deployment mode

Mode	Use when	Notes
Local (bare)	Developer machine, one-off experiments	Fastest feedback loop. Use `neuropose process`.
Systemd service	Single-host lab server	Recommended for study runs. Auto-restart, log capture, clean shutdown.
Docker	Shared infra, CI pipelines, reproducible runs	Image build is pending commit 12.
Kubernetes	Multi-study labs with shared GPU pools	Not currently supported; would layer on top of the Docker image.

Local (bare-metal)

For one-off processing, the CLI is enough:

neuropose --config ./config.yaml process path/to/video.mp4

For batch mode, run the daemon in a tmux or screen session:

tmux new -s neuropose
neuropose --config ./config.yaml --verbose watch
# Ctrl-B D to detach

Systemd user service

A systemd user unit (not a root-privileged one) is the right way to run the daemon on a research server where the researcher owns the job queue.

Create ~/.config/systemd/user/neuropose.service:

[Unit]
Description=NeuroPose job daemon
After=network-online.target

[Service]
Type=simple
WorkingDirectory=%h/neuropose
Environment=XDG_DATA_HOME=%h/.local/share
ExecStart=%h/neuropose/.venv/bin/neuropose --config %h/neuropose/config.yaml watch
Restart=on-failure
RestartSec=10

[Install]
WantedBy=default.target

Enable it:

systemctl --user daemon-reload
systemctl --user enable --now neuropose.service
journalctl --user -u neuropose.service -f

The interfacer's fcntl-based lock file prevents a second daemon from starting if systemd restarts it before the first instance has fully released the lock.

Docker

Pending commit 12. The plan is to ship two Dockerfiles:

Dockerfile — CPU base, suitable for small studies.
Dockerfile.gpu — CUDA base derived from tensorflow/tensorflow:<pinned>-gpu.

Both images will have the neuropose command as their ENTRYPOINT so they can be invoked as:

docker run --rm \
  -v /srv/neuropose:/data \
  -e NEUROPOSE_DATA_DIR=/data/jobs \
  -e NEUROPOSE_MODEL_CACHE_DIR=/data/models \
  ghcr.io/.../neuropose:latest \
  watch

GPU considerations

NeuroPose delegates device selection to TensorFlow via the device field in Settings ("/CPU:0" or "/GPU:0"). No multi-GPU dispatch yet — a single daemon instance uses a single device.
If you need to run inference on multiple GPUs in parallel, run one daemon per GPU with distinct data_dir values and divide jobs between them. The fcntl lock is keyed on the data directory, so separate daemons on separate data dirs do not conflict.
The first call to Estimator.process_video triggers MeTRAbs model load, which in turn initializes the TensorFlow GPU runtime. Expect a one-time startup delay of several seconds.

Log management

The daemon writes to stdlib logging. Under systemd, logs land in the user journal. For other deployment modes, redirect stdout/stderr to your log collector of choice — NeuroPose writes one line per event with a structured %(asctime)s %(levelname)-8s %(name)s: %(message)s format, which any log aggregator can parse.

Log verbosity is controlled via the CLI:

neuropose --verbose watch   # DEBUG
neuropose watch             # INFO (default)
neuropose --quiet watch     # WARNING

Monitoring

The canonical state of the daemon lives in $data_dir/out/status.json, which is a JSON object keyed by job name. A tiny Prometheus exporter or a nightly cron that tails the file is enough to alert on stuck jobs. A richer monitoring story is out of scope for v0.1.

Backups and retention

Two things are worth backing up:

$data_dir/out/*/results.json — the aggregated predictions for each job. These are the outputs of the research process.
$data_dir/out/status.json — the daemon's record of which jobs ran when, which failed, and why.

Do not back up $data_dir/in/ or $data_dir/failed/ indiscriminately. These contain source video files that may be IRB-protected subject data, and your backup store may not be covered by the same data-handling agreement as the primary server. Consult the (pending) docs/data-policy.md before designing a retention plan.

4.7 KiB Raw Blame History