# Recipe: Dataset Capture + Dev Loop

_Collect a dataset from the network source and use it to iterate on pipeline code._

This recipe covers two workflows:
- Record a dataset from live network cameras (data collection).
- Replay that dataset in a tight loop to develop and test `_pipeline.py` deterministically.

---

## 0) Prerequisites
- `s6` installed and runnable (`pip install -e .`).
- Start the capture server in another terminal:
  
  ```bash
  s6 stream
  ```
- Optional but recommended: author camera ordering/ROIs once so frames are consistent:
  
  ```bash
  s6 id
  ```
- If you have a pipeline config (JSON/YAML), keep it under `configs/` for repeatability.

---

## 1) Record from the network source
Use `track` in record-only mode so frames are saved without running inference.

```bash
# Record from the live network stream into a dataset directory
s6 track -i network -o ./datasets/run_net_01 -r

# Optionally also write logs for environment + performance context
s6 track -i network -o ./datasets/run_net_01 -r -x
```

Notes
- `-i network` attaches to the running `s6 stream` server.
- `-o` sets the dataset output directory (created if missing).
- `-r/--record-only` skips inference for maximal throughput while collecting data.
- With `-x`, a run folder is created under `logs/runs/<timestamp>/` with `metrics.json` and a Chrome trace (`perf.log.json`).

---

## 2) Develop and test the pipeline on the dataset
Re‑run `track` against your saved dataset. Use `--repeat` to loop playback for quick iteration.

```bash
# Headless, repeat playback, and write logs for profiling
s6 track ./datasets/run_net_01 --repeat -x

# With a UI to visualize overlays while you iterate on code
s6 track ./datasets/run_net_01 --repeat -v

# Pin a specific pipeline config to keep runs consistent
s6 track ./datasets/run_net_01 --repeat -x \
  --config ./configs/pipeline.config.yaml
```

Tips
- Edit `src/s6/app/_pipeline.py`, then re‑run the same command to validate changes deterministically.
- For fast performance investigation, open `logs/runs/<ts>/perf.log.json` in `chrome://tracing` or Perfetto (ui.perfetto.dev).
- Use `s6 perf-stats` to summarize timing from `metrics.json` and compare runs.

---

## 3) Optional: augment or refine the dataset
- If needed, run the filter UI to prune bad samples:
  
  ```bash
  s6 data filter ./datasets/run_net_01
  ```
- To extend coverage, record additional sessions and merge their directories (the format is append‑friendly).

---

## Troubleshooting
- Recording is slow or drops frames: keep `--record-only` and close other heavy apps; consider disabling UI during collection.
- Replay is too fast/slow: your dataset frames play as fast as the pipeline processes them; use profiling logs (`-x`) to spot bottlenecks.
- Results differ between runs: fix your `--config` and environment; ensure calibrations are up‑to‑date (`s6.id` and `configs/`).

---

See also
- Tracking app usage: `application/track.md`
- Chrome trace workflow: `recipes/pipeline_chrome_trace.md`
- Stats comparison tool: `application/perf-stats.md`