s6.utils.r2

Thin Cloudflare R2 client (S3‑compatible).

Provides a minimal wrapper around boto3 for listing, uploading, downloading, and deleting objects in Cloudflare R2 using the S3 API. It is deliberately small and explicit, with sensible defaults and safeguards against accidental overwrites.

Key concepts

  • Uses standard S3 credentials and custom endpoint_url for R2.

  • Environment variables can supply credentials/region when arguments are omitted: R2_ACCESS_KEY_ID/AWS_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY/AWS_SECRET_ACCESS_KEY, and optionally R2_REGION_NAME/AWS_REGION/AWS_DEFAULT_REGION.

  • Upload helpers default to no‑overwrite behaviour; pass overwrite=True where supported to replace existing objects.

Motivation

Experiments often produce large datasets and intermediate artifacts that need to be shared between local development machines, remote teammates, and lab hardware. Cloudflare R2 provides inexpensive S3‑compatible object storage without egress fees. Wrapping it behind a tiny, documented interface lets the pipeline move data in and out of a relatively abstracted storage backend without depending on any vendor‑specific SDK beyond the ubiquitous S3 API.

This module aims to be: - Minimal: only the operations we actually use (list, get/put, delete). - Predictable: no silent overwrites by default; explicit overwrite=True

when you intend to replace data.

  • Portable: credentials via env or args; works with R2 and any S3‑compatible endpoint (e.g., MinIO) by changing endpoint_url.

Design

The R2Client is a thin façade over a boto3 S3 client:

  • Construction resolves credentials/region from arguments or environment variables and creates a namespaced S3 client using a custom endpoint_url.

  • Listings use ListObjectsV2 with pagination (ContinuationToken) and optional delimiter semantics when recursive=False. Results are returned as a list of R2Object plus any child prefixes.

  • Upload helpers intentionally refuse to overwrite existing keys unless you pass overwrite=True (for directory uploads). Byte and file uploads raise FileExistsError if the key is present.

  • download_directory recreates the relative structure under a prefix.

  • _exists issues a HEAD request and interprets common not‑found error codes; other errors are re‑raised to the caller.

  • To keep documentation builds importable without cloud dependencies, boto3 imports are guarded. Instantiating R2Client fails fast with a clear ImportError if boto3 is unavailable.

class s6.utils.r2.R2Object(key: str, size: int, last_modified: Any, etag: str)

Bases: object

Lightweight object metadata returned by listings.

key

Object key (path) within the bucket.

Type:

str

size

Object size in bytes.

Type:

int

last_modified

Timestamp of last modification (datetime from boto3). Typing kept generic to avoid a hard dependency in the public API.

Type:

Any

etag

Entity tag (usually an MD5 for non‑multipart uploads).

Type:

str

key: str
size: int
last_modified: Any
etag: str
class s6.utils.r2.R2Client(access_key_id: str | None = None, secret_access_key: str | None = None, bucket_name: str = '', endpoint_url: str = '', region_name: str = 'auto', session: Session | None = None)

Bases: object

Minimal S3‑compatible client for Cloudflare R2.

Parameters:
  • access_key_id (str, optional) – Access key ID. When None, resolved from environment variables R2_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID.

  • secret_access_key (str, optional) – Secret access key. When None, resolved from R2_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY.

  • bucket_name (str, optional) – Target bucket name.

  • endpoint_url (str, optional) – R2 endpoint URL, e.g. "https://<account>.r2.cloudflarestorage.com".

  • region_name (str, optional) – Region name. If set to "auto" (default), resolved from R2_REGION_NAME/AWS_REGION/AWS_DEFAULT_REGION when present.

  • session (boto3.session.Session, optional) – Existing boto3 session to reuse. A new session is created by default.

Examples

Create a client and perform common operations:

client = R2Client(
    access_key_id="...",
    secret_access_key="...",
    bucket_name="my-bucket",
    endpoint_url="https://<account>.r2.cloudflarestorage.com",
)

client.upload_file("local.txt", "folder/remote.txt")
client.download_file("folder/remote.txt", "downloaded.txt")
client.delete_object("folder/remote.txt")
objects, prefixes = client.list(prefix="folder/", recursive=False)
list(prefix: str = '', recursive: bool = True) tuple[List[R2Object], List[str]]

List objects and pseudo‑directories under a prefix.

Parameters:
  • prefix (str, optional) – Key prefix to list under, e.g., "folder/sub/".

  • recursive (bool, optional) – If True (default), returns all objects under prefix. If False, treats "/" as a delimiter and returns only one level, also returning sub‑prefixes.

Returns:

A pair (objects, prefixes). objects contains metadata for each object; prefixes lists child “directories” (only when recursive=False).

Return type:

(list[R2Object], list[str])

download_file(key: str, local_path: str, *, progress: bool = False) None

Download an object to a local file.

Parameters:
  • key (str) – Object key in the bucket.

  • local_path (str) – Local filesystem path to write to.

download_directory(prefix: str, local_dir: str, *, max_workers: int = 8, progress: bool = False, overwrite: bool = False) None

Download all objects under a prefix into a local directory, optionally in parallel.

Preserves the relative structure underneath prefix.

Parameters:
  • prefix (str) – Key prefix to download (e.g., "folder/sub/").

  • local_dir (str) – Destination local directory. Created if missing.

get_object_bytes(key: str) bytes

Download an object into memory (as bytes).

Parameters:

key (str) – Object key.

Returns:

Object data.

Return type:

bytes

upload_file(local_path: str, key: str, *, progress: bool = False) None

Upload a local file without overwriting existing objects.

Parameters:
  • local_path (str) – Path to the local file.

  • key (str) – Destination object key in the bucket.

Raises:

FileExistsError – If an object already exists at key.

upload_directory(local_dir: str, prefix: str, *, overwrite: bool = False, max_workers: int = 8, progress: bool = False) None

Upload a directory tree under a prefix, optionally in parallel.

Preserves the relative path structure under local_dir. Keys are normalised to use / separators.

Parameters:
  • local_dir (str) – Path to the local directory whose contents to upload.

  • prefix (str) – Destination key prefix (e.g., "folder/sub/").

  • overwrite (bool, optional) – If False (default), raise FileExistsError when a destination key already exists. If True, existing objects are replaced.

  • Parameters (Additional)

  • ---------------------

  • max_workers (int, optional) – Number of worker threads for parallel uploads (default 8).

  • progress (bool, optional) – If True, prints a simple aggregate progress indicator to stderr.

Raises:
  • NotADirectoryError – If local_dir is not a directory.

  • FileExistsError – If overwrite is False and any destination key exists.

upload_bytes(data: bytes, key: str) None

Upload raw bytes as an object without overwriting.

Parameters:
  • data (bytes) – Bytes to upload.

  • key (str) – Destination object key.

Raises:

FileExistsError – If an object already exists at key.

delete_object(key: str, missing_ok: bool = True) None

Delete an object.

Parameters:
  • key (str) – Object key to delete.

  • missing_ok (bool, optional) – If True (default), succeed even if the object is missing. If False, raise an error when the object does not exist.

Raises:

FileNotFoundError – If missing_ok is False and the object does not exist.