# dataset — Manage datasets (local and R2) Provides a focused CLI for listing, uploading, downloading, and deleting dataset directories. A “dataset” is any folder containing a `data.jsonl` file (see examples under `./temp`). Remote storage uses an S3‑compatible backend via R2 with a configurable bucket/endpoint. See also: `docs/dataset_storage.md` for motivation, design, and advanced usage. ## Usage (selected) ```bash # List local datasets under ./temp (default root) s6 dataset list # List remote datasets (under the remote prefix, one level) s6 dataset list --remote -b $R2_BUCKET -e $R2_ENDPOINT # Upload a local dataset directory (no overwrite by default) s6 dataset upload diverse_2 # Download a remote dataset into ./temp/diverse_2 (fail‑fast without --overwrite) s6 dataset download diverse_2 # Delete a remote dataset (requires --yes) s6 dataset delete diverse_2 --yes ``` ## How it works - Local datasets live under a root directory (default `./temp`). - Remote datasets live under a base prefix (default `datasets/`) in your bucket; each dataset maps to `datasets//` preserving relative paths. - The command wraps :mod:`s6.utils.r2` and uses `R2Client` for S3‑compatible operations (list, upload, download, delete). - Upload/download are parallelised and show progress; both are fail‑fast when `--overwrite` is not provided. ## Common flags - `--root` — local root for dataset directories (default `temp`) - `--remote-prefix` — base remote prefix (default `datasets/`) - `-w, --workers` — parallel workers for upload/download (default `8`) - R2: `-b, --bucket` (bucket), `-e, --endpoint` (endpoint URL), `--region` (optional) ## Subcommands - `list [--remote] [--local] [--local-only]` - Show dataset names locally, remotely, or both. Remote listing is one level under `--remote-prefix`. - `upload [--overwrite]` - Upload `--root/` to `--remote-prefix//`. Without `--overwrite`, aborts if any destination keys already exist. - `download [--overwrite]` - Download `--remote-prefix//` into `--root/`. Without `--overwrite`, aborts if any local targets already exist. Warns if the folder lacks `data.jsonl`. - `delete [-y|--yes] [--local]` - Delete the remote dataset; with `--local`, also remove the local folder. ## Examples ```bash # Change the remote prefix (e.g., project‑scoped datasets) s6 dataset list --remote --remote-prefix projects/robotA/ # Upload with overwrite (replace existing keys) s6 dataset upload trial_002 --overwrite # Download to a non‑default local root s6 dataset download run_net_01 --root ./datasets # Remove remote and local copies s6 dataset delete run_net_01 --yes --local ```