dataset — Manage datasets (local and R2)¶
Provides a focused CLI for listing, uploading, downloading, and deleting
dataset directories. A “dataset” is any folder containing a data.jsonl file
(see examples under ./temp). Remote storage uses an S3‑compatible backend
via R2 with a configurable bucket/endpoint.
See also: docs/dataset_storage.md for motivation, design, and advanced usage.
Usage (selected)¶
# List local datasets under ./temp (default root)
s6 dataset list
# List remote datasets (under the remote prefix, one level)
s6 dataset list --remote -b $R2_BUCKET -e $R2_ENDPOINT
# Upload a local dataset directory (no overwrite by default)
s6 dataset upload diverse_2
# Download a remote dataset into ./temp/diverse_2 (fail‑fast without --overwrite)
s6 dataset download diverse_2
# Delete a remote dataset (requires --yes)
s6 dataset delete diverse_2 --yes
How it works¶
Local datasets live under a root directory (default
./temp).Remote datasets live under a base prefix (default
datasets/) in your bucket; each dataset maps todatasets/<name>/preserving relative paths.The command wraps :mod:
s6.utils.r2and usesR2Clientfor S3‑compatible operations (list, upload, download, delete).Upload/download are parallelised and show progress; both are fail‑fast when
--overwriteis not provided.
Common flags¶
--root— local root for dataset directories (defaulttemp)--remote-prefix— base remote prefix (defaultdatasets/)-w, --workers— parallel workers for upload/download (default8)R2:
-b, --bucket(bucket),-e, --endpoint(endpoint URL),--region(optional)
Subcommands¶
list [--remote] [--local] [--local-only]Show dataset names locally, remotely, or both. Remote listing is one level under
--remote-prefix.
upload <name> [--overwrite]Upload
--root/<name>to--remote-prefix/<name>/. Without--overwrite, aborts if any destination keys already exist.
download <name> [--overwrite]Download
--remote-prefix/<name>/into--root/<name>. Without--overwrite, aborts if any local targets already exist. Warns if the folder lacksdata.jsonl.
delete <name> [-y|--yes] [--local]Delete the remote dataset; with
--local, also remove the local folder.
Examples¶
# Change the remote prefix (e.g., project‑scoped datasets)
s6 dataset list --remote --remote-prefix projects/robotA/
# Upload with overwrite (replace existing keys)
s6 dataset upload trial_002 --overwrite
# Download to a non‑default local root
s6 dataset download run_net_01 --root ./datasets
# Remove remote and local copies
s6 dataset delete run_net_01 --yes --local