Dataset Storage and R2 Utilities¶
This page documents how s6 stores dataset folders locally and in Cloudflare
R2. A dataset is any folder that contains a data.jsonl file.
Storage model¶
Local datasets live under the root directory
temp/by default.Remote datasets live under the prefix
datasets/<name>/by default.The
s6 datasetcommand works with dataset names.The
s6 r2command works with object keys and prefixes directly.
Credentials and defaults¶
The R2 helpers use S3-compatible credentials and an explicit endpoint.
Access key:
R2_ACCESS_KEY_IDorAWS_ACCESS_KEY_IDSecret key:
R2_SECRET_ACCESS_KEYorAWS_SECRET_ACCESS_KEYRegion:
R2_REGION_NAME,AWS_REGION, orAWS_DEFAULT_REGIONBucket default:
assetsEndpoint default:
https://1195172285921be7f47e85de5cc4a5ad.r2.cloudflarestorage.com
The command-line helpers also honor:
R2_BUCKETR2_ENDPOINTorR2_ENDPOINT_URL
s6 r2¶
s6 r2 is the lower-level object storage interface in
src/s6/app/r2.
list¶
s6 r2 list [prefix] [--flat]
Lists objects under the optional prefix.
The default is recursive listing.
--flatswitches to one-level listing and prints child prefixes first.
download¶
s6 r2 download <object-or-prefix> [-o DIR] [-w N] [-p] [--overwrite]
A trailing
/forces prefix download.Otherwise the command first checks for an exact object key, then falls back to prefix download if children exist.
Exact object downloads preserve the full key path under the output directory.
Prefix downloads preserve the relative structure beneath the prefix.
The default output directory is
temp/.Without
--overwrite, downloads fail fast if any target file already exists.
upload¶
s6 r2 upload <local-path> <dest-key-or-prefix> [-w N] [-p] [--overwrite]
If the source is a directory, the tree is uploaded recursively under the destination prefix.
If the destination does not end with
/, one is added for directory uploads.If the source is a file and the destination ends with
/, the basename is appended; otherwise the provided key is used exactly.File uploads never overwrite existing objects.
--overwriteonly applies to directory uploads.Without
--overwrite, directory uploads fail fast if any destination key already exists.
delete¶
s6 r2 delete <key-or-prefix> [-r|--recursive]
Deletes exactly one object unless
--recursiveis provided.A trailing
/also forces prefix deletion.
s6 dataset¶
s6 dataset is the named-dataset wrapper in src/s6/app/dataset.py.
list¶
s6 dataset list [--remote] [--local] [--local-only]
Lists dataset names, where a dataset is a folder containing
data.jsonl.Local datasets are scanned under
--rootand are shown by default.--remoteadds the remote list under--remote-prefix.--local-onlysuppresses the remote list.
upload¶
s6 dataset upload <name> [--overwrite]
Uploads
--root/<name>to--remote-prefix/<name>/.Uploads are parallel and progress is enabled by default.
Without
--overwrite, the command fails before uploading if any remote destination key already exists.
download¶
s6 dataset download <name> [--overwrite]
Downloads
--remote-prefix/<name>/into--root/<name>.Downloads are parallel and progress is enabled by default.
Without
--overwrite, the command fails before downloading if any local target file already exists.If the resulting folder does not contain
data.jsonl, the command prints a warning.
delete¶
s6 dataset delete <name> [-y|--yes] [--local]
Deletes the remote dataset under
--remote-prefix/<name>/.--yesis required for confirmation.--localalso removes the local dataset directory.
R2Client¶
src/s6/utils/r2.py provides the library layer used by
the CLI commands.
Methods¶
list(prefix="", recursive=True)download_file(key, local_path, progress=False)download_directory(prefix, local_dir, max_workers=8, progress=False, overwrite=False)upload_file(local_path, key, progress=False)upload_directory(local_dir, prefix, overwrite=False, max_workers=8, progress=False)upload_bytes(data, key)delete_object(key, missing_ok=True)
Behavior¶
Listings paginate through
ListObjectsV2.Non-recursive listings return both objects and child prefixes.
Upload and download directory helpers use a thread pool for file-level parallelism.
upload_fileandupload_bytesnever overwrite.upload_directoryanddownload_directorypreflight collisions whenoverwrite=Falseand raiseFileExistsErrorbefore any transfer starts.Progress output is printed to stderr when enabled.
The module guards optional
boto3imports so docs and tests can import it without cloud dependencies installed.
Example workflow¶
# Record a dataset locally
s6 track -i gst -o ./temp/run_net_01 -r -x
# Upload it to shared storage
s6 dataset upload run_net_01
# Download it on another machine
s6 dataset download run_net_01
# Replay the dataset
s6 track -i ./temp/run_net_01 --repeat -x
Troubleshooting¶
Missing credentials usually means
R2_ACCESS_KEY_IDandR2_SECRET_ACCESS_KEYare unset.If an upload or download fails immediately, check whether the destination already exists and whether
--overwriteis needed.For large dataset trees, increase
-w/--workersto use more parallel file transfers.