dataset - Manage datasets locally and in R2¶
s6 dataset manages named dataset directories. A dataset is any directory that
contains a data.jsonl file. By default the command works under the local root
temp/ and stores remote datasets under the R2 prefix datasets/<name>/.
See also: docs/dataset_storage.md for the storage
model and the lower-level R2 helpers.
Usage¶
# List local datasets under ./temp (default)
s6 dataset list
# Add the remote list as well
s6 dataset list --remote -b assets -e https://<account>.r2.cloudflarestorage.com
# Upload a local dataset directory
s6 dataset upload diverse_2
# Download a remote dataset into ./temp/diverse_2
s6 dataset download diverse_2
# Delete a remote dataset and its local copy
s6 dataset delete diverse_2 --yes --local
Behavior¶
Local datasets are discovered by scanning
--rootfor directories that containdata.jsonl.Remote datasets are stored under
--remote-prefix/<name>/and listed one level at a time.listshows local datasets by default.--remoteadds the remote list on top of the local output.--local-onlysuppresses the remote list when--remoteis used.Upload and download operations are parallel and print progress by default.
Without
--overwrite, upload and download preflight for collisions and fail before any transfer starts.deleterefuses to run unless--yesis provided.
Flags¶
--rootsets the local dataset root. Default:temp--remote-prefixsets the remote dataset prefix. Default:datasets/-w, --workerssets the parallel worker count for upload and download. Default:8-b, --bucketsets the R2 bucket. Default:assetsorR2_BUCKET-e, --endpointsets the R2 endpoint URL. Default:R2_ENDPOINT,R2_ENDPOINT_URL, or the built-in Cloudflare R2 endpoint--regionoptionally overrides the R2 region
Subcommands¶
list [--remote] [--local] [--local-only]--localis accepted explicitly, but local datasets are already listed by default.Remote listing uses a one-level R2 prefix listing under
--remote-prefix.
upload <name> [--overwrite]Uploads
--root/<name>to--remote-prefix/<name>/.The local directory must exist and contain
data.jsonl.Without
--overwrite, the command fails if any destination object already exists.
download <name> [--overwrite]Downloads
--remote-prefix/<name>/into--root/<name>.If the target directory already has any files and
--overwriteis not set, the command fails before downloading.After download, it warns if the folder does not contain
data.jsonl.
delete <name> [-y|--yes] [--local]Deletes every object under
--remote-prefix/<name>/.--localalso removes--root/<name>.
Examples¶
# Show both local and remote dataset names
s6 dataset list --remote
# Show only the remote list
s6 dataset list --remote --local-only
# Upload with overwrite
s6 dataset upload trial_002 --overwrite
# Download into a custom local root
s6 dataset download run_net_01 --root ./datasets
# Remove remote and local copies
s6 dataset delete run_net_01 --yes --local