dataset — Manage datasets (local and R2)

Provides a focused CLI for listing, uploading, downloading, and deleting dataset directories. A “dataset” is any folder containing a data.jsonl file (see examples under ./temp). Remote storage uses an S3‑compatible backend via R2 with a configurable bucket/endpoint.

See also: docs/dataset_storage.md for motivation, design, and advanced usage.

Usage (selected)

# List local datasets under ./temp (default root)
s6 dataset list

# List remote datasets (under the remote prefix, one level)
s6 dataset list --remote -b $R2_BUCKET -e $R2_ENDPOINT

# Upload a local dataset directory (no overwrite by default)
s6 dataset upload diverse_2

# Download a remote dataset into ./temp/diverse_2 (fail‑fast without --overwrite)
s6 dataset download diverse_2

# Delete a remote dataset (requires --yes)
s6 dataset delete diverse_2 --yes

How it works

  • Local datasets live under a root directory (default ./temp).

  • Remote datasets live under a base prefix (default datasets/) in your bucket; each dataset maps to datasets/<name>/ preserving relative paths.

  • The command wraps :mod:s6.utils.r2 and uses R2Client for S3‑compatible operations (list, upload, download, delete).

  • Upload/download are parallelised and show progress; both are fail‑fast when --overwrite is not provided.

Common flags

  • --root — local root for dataset directories (default temp)

  • --remote-prefix — base remote prefix (default datasets/)

  • -w, --workers — parallel workers for upload/download (default 8)

  • R2: -b, --bucket (bucket), -e, --endpoint (endpoint URL), --region (optional)

Subcommands

  • list [--remote] [--local] [--local-only]

    • Show dataset names locally, remotely, or both. Remote listing is one level under --remote-prefix.

  • upload <name> [--overwrite]

    • Upload --root/<name> to --remote-prefix/<name>/. Without --overwrite, aborts if any destination keys already exist.

  • download <name> [--overwrite]

    • Download --remote-prefix/<name>/ into --root/<name>. Without --overwrite, aborts if any local targets already exist. Warns if the folder lacks data.jsonl.

  • delete <name> [-y|--yes] [--local]

    • Delete the remote dataset; with --local, also remove the local folder.

Examples

# Change the remote prefix (e.g., project‑scoped datasets)
s6 dataset list --remote --remote-prefix projects/robotA/

# Upload with overwrite (replace existing keys)
s6 dataset upload trial_002 --overwrite

# Download to a non‑default local root
s6 dataset download run_net_01 --root ./datasets

# Remove remote and local copies
s6 dataset delete run_net_01 --yes --local