dataset - Manage datasets locally and in R2

s6 dataset manages named dataset directories. A dataset is any directory that contains a data.jsonl file. By default the command works under the local root temp/ and stores remote datasets under the R2 prefix datasets/<name>/.

See also: docs/dataset_storage.md for the storage model and the lower-level R2 helpers.

Usage

# List local datasets under ./temp (default)
s6 dataset list

# Add the remote list as well
s6 dataset list --remote -b assets -e https://<account>.r2.cloudflarestorage.com

# Upload a local dataset directory
s6 dataset upload diverse_2

# Download a remote dataset into ./temp/diverse_2
s6 dataset download diverse_2

# Delete a remote dataset and its local copy
s6 dataset delete diverse_2 --yes --local

Behavior

  • Local datasets are discovered by scanning --root for directories that contain data.jsonl.

  • Remote datasets are stored under --remote-prefix/<name>/ and listed one level at a time.

  • list shows local datasets by default.

  • --remote adds the remote list on top of the local output.

  • --local-only suppresses the remote list when --remote is used.

  • Upload and download operations are parallel and print progress by default.

  • Without --overwrite, upload and download preflight for collisions and fail before any transfer starts.

  • delete refuses to run unless --yes is provided.

Flags

  • --root sets the local dataset root. Default: temp

  • --remote-prefix sets the remote dataset prefix. Default: datasets/

  • -w, --workers sets the parallel worker count for upload and download. Default: 8

  • -b, --bucket sets the R2 bucket. Default: assets or R2_BUCKET

  • -e, --endpoint sets the R2 endpoint URL. Default: R2_ENDPOINT, R2_ENDPOINT_URL, or the built-in Cloudflare R2 endpoint

  • --region optionally overrides the R2 region

Subcommands

  • list [--remote] [--local] [--local-only]

    • --local is accepted explicitly, but local datasets are already listed by default.

    • Remote listing uses a one-level R2 prefix listing under --remote-prefix.

  • upload <name> [--overwrite]

    • Uploads --root/<name> to --remote-prefix/<name>/.

    • The local directory must exist and contain data.jsonl.

    • Without --overwrite, the command fails if any destination object already exists.

  • download <name> [--overwrite]

    • Downloads --remote-prefix/<name>/ into --root/<name>.

    • If the target directory already has any files and --overwrite is not set, the command fails before downloading.

    • After download, it warns if the folder does not contain data.jsonl.

  • delete <name> [-y|--yes] [--local]

    • Deletes every object under --remote-prefix/<name>/.

    • --local also removes --root/<name>.

Examples

# Show both local and remote dataset names
s6 dataset list --remote

# Show only the remote list
s6 dataset list --remote --local-only

# Upload with overwrite
s6 dataset upload trial_002 --overwrite

# Download into a custom local root
s6 dataset download run_net_01 --root ./datasets

# Remove remote and local copies
s6 dataset delete run_net_01 --yes --local