s6.vision package¶

Submodules¶

s6.vision.camera module¶

Camera model and image-space transforms for Sense Core.

This module defines a differentiable pinhole camera with optional lens distortion and a suite of utilities to transform points and images between world, camera, and pixel coordinate systems. It is written on top of PyTorch and supports batched inputs, autograd, and GPU execution where applicable.

Key concepts¶

Intrinsic matrix K: maps normalized camera coordinates to pixel space.
Extrinsic matrix E (world-to-camera): rigid transform of world points.
Distortion parameters: radial/tangential or fisheye polynomial models.
Coordinate transforms: transform/transform_inv, project, unproject for moving between spaces; warping helpers for points/images.

All point-like APIs accept tensors with an arbitrary leading batch shape and operate on the last dimension, e.g. (..., 3) for 3D and (..., 2) for 2D. Decorators in this module handle reshaping and type guarantees.

class s6.vision.camera.Camera(intrinsic: Tensor | ndarray | None = None, extrinsic: Tensor | None = None, fov: float | None = None, distortion: Tensor | ndarray | None = None, resolution: Tuple[int, int] = (1024, 1024), requires_grad: bool = False, fisheye: bool = False, near: float = 0.01, far: float = 1000.0, dtype: dtype = torch.float32)¶

Bases: object

Pinhole camera with optional lens distortion and PyTorch ops.

The camera stores intrinsics, extrinsics (world-to-camera), and optional lens distortion parameters. Methods provide common transformations between world, camera, and pixel spaces; image/point warping; and look-at/tilting helpers for virtual camera control.

Most methods accept batched tensors and preserve leading dimensions.

classmethod calculate_intrinsic_matrix_focal_length(focal_length: float, sensor_size: float, resolution: Tuple[int, int], vertical: bool = True) → Tuple[float, float, float, float]¶: Calculate the intrinsic matrix using focal length and sensor size.

classmethod calculate_intrinsic_matrix_fov(fov: float, resolution: Tuple[int, int]) → Tuple[float, float, float, float]¶: Calculate the intrinsic matrix using field of view.

checkerboard(grid_size)¶

Generate a checkerboard pattern as a numpy array.

Parameters:

resolution (tuple) – The resolution of the image (height, width).
grid_size (int) – The size of each square in the grid.

Returns:

The checkerboard pattern as a 2D numpy array.

Return type:

numpy.ndarray

checkerboard_with_aruco(grid_size, aruco_dict=None)¶

Generate a checkerboard pattern with ArUco markers in the black tiles.

Parameters:

resolution (tuple) – The resolution of the image (height, width).
grid_size (int) – The size of each square in the grid.
aruco_dict (cv2.aruco_Dictionary) – The dictionary of ArUco markers to use.

Returns:

The checkerboard pattern with ArUco markers as a 2D numpy array.

Return type:

numpy.ndarray

create_intrinsic_matrix(intrinsic_params) → Tensor¶: create a 3x3 intrinsic matrix from intrinsic parameters cx,cy,fx,fy, to ensure that the intrinsic matrix remain an intrinsic matrix after gradient update.

property cx: Tensor¶: Principal point x-coordinate (in pixels).

property cy: Tensor¶: Principal point y-coordinate (in pixels).

denormalize(points: Tensor) → Tensor¶

Denormalize normalized coordinates back to pixel coordinates.

Parameters:: points (Tensor) – Tensor of shape (…, 2), where the last dimension represents normalized (x, y) coordinates.
Returns:: Pixel coordinates.
Return type:: Tensor

distort_points(points: Tensor)¶: Apply non-fisheye distortion to 2D points using PyTorch.

distort_points_fisheye(points: Tensor) → Tensor¶

Apply fisheye distortion to 2D points using PyTorch.

Parameters:: points (torch.Tensor) – Tensor of shape (*batch_shape, …, 2), 2D points.
Returns:: Distorted 2D points.
Return type:: torch.Tensor

property distortion: Tensor¶: Distortion coefficient vector.

property dtype¶

property extrinsic: Tensor¶: 4x4 world-to-camera transform matrix E.

property extrinsic_inv: Tensor¶: Inverse of the extrinsic matrix (camera-to-world).

eye() → Camera¶: Return an indentical version of this Camera instance with extrinsic set to identity

property far¶

property fisheye¶

property forward¶

classmethod from_dict(dict_obj: Dict[str, Any]) → Camera¶

Deserialize a dictionary to create a Camera instance.

Parameters:: dict_obj (dict) – A dictionary containing serialized Camera attributes.
Returns:: A new instance of Camera initialized with the provided attributes.
Return type:: Camera

static from_homogeneous(points_h: Tensor) → Tensor¶

Convert points from homogeneous coordinates to Cartesian coordinates using PyTorch.

Parameters:: points_h – A torch.Tensor of points in homogeneous coordinates. The shape is (…, N+1).
Returns:: Points in Cartesian coordinates with shape (…, N).

classmethod from_physical_parameters(resolution, sensor_size, focal_length, **kwargs) → Camera¶

property fx: Tensor¶: Focal length along x (in pixels).

property fy: Tensor¶: Focal length along y (in pixels).

property hfov: Tensor¶: Return the horizontal field of view in degrees.

property intrinsic: Tensor¶: 3x3 intrinsic matrix K.

property intrinsic_inv: Tensor¶: Inverse of the 3x3 intrinsic matrix K^{-1}.

look_at()¶

Create a rotated Camera instance facing the new view target with a specified up direction.

Parameters:

target (torch.Tensor) – Target points with compatible batch shape (…, 3)
up (torch.Tensor, optional) – Up direction vector with a shape of (3,), defaults to (0, -1, 0)

Return type:

New Camera instance

look_at_uv(uv: Tensor) → Camera¶

Adjust the camera to look at a 2D point or multiple points specified in image coordinates.

Parameters:: uv (torch.Tensor) – Target points of shape (…, 2), supports arbitrary batch shape.
Returns:: A new Camera instance looking at the specified point(s).
Return type:: Camera

property near¶

normalize(points: Tensor) → Tensor¶

Normalize pixel coordinates to the [0, 1) range.

Parameters:: points (Tensor) – Tensor of shape (…, 2), where the last dimension represents (x, y) coordinates.
Returns:: Normalized coordinates in the range [0, 1).
Return type:: Tensor

project(points: Tensor) → Tensor¶

Project points from the camera’s view space to the camera’s pixel space using PyTorch tensors.

Parameters:: points – A torch.Tensor of points. The shape is (…, 3).
Returns:: Projected points in the camera’s pixel space.

property projection_matrix: Tensor¶: 3x4 projection matrix P = K [R|t].

property requires_grad: bool¶

resize(new_resolution)¶: Change the camera’s resolution using PyTorch. :param new_resolution: New resolution as a tuple (width, height). :return: New Camera instance with updated resolution and adjusted intrinsic matrix.

property resolution: Tuple[int, int]¶: Image resolution as (H, W).

property rotation_matrix: Tensor¶: 3x3 rotation part of the extrinsic matrix (world-to-camera).

classmethod rotation_matrix_from_axis_angle(axis, angle)¶: Compute the rotation matrix from an axis and an angle (Rodrigues’ rotation formula)

tilt(angle: float) → Camera¶

Tilt the Camera instance by rotating its extrinsic matrix along the local z-axis.

Parameters:: angle (float) – Tilt angle in radians.
Return type:: New Camera instance with the tilted extrinsic matrix.

to_dict() → Dict[str, Any]¶

Serialize the Camera instance to a dictionary.

Returns:: A dictionary containing all serializable attributes of the Camera.
Return type:: dict

static to_homogeneous(points: Tensor) → Tensor¶

Convert points to homogeneous coordinates using PyTorch.

Parameters:: points – A torch.Tensor of points. The shape is (…, N).
Returns:: Points in homogeneous coordinates with shape (…, N+1).

transform(points: Tensor) → Tensor¶

Convert points from world coordinates to the camera’s view space using PyTorch tensors.

Parameters:: points – A torch.Tensor of points. The shape is (…, 3).
Returns:: Transformed points in the camera’s view space.

transform_inv(points: Tensor) → Tensor¶

Convert points from the camera’s coordinate system to the world coordinate system.

Parameters:: points – A numpy array of points in the camera’s coordinate system. The shape is (…, 3).
Returns:: Points in the world coordinate system.

property translation: Tensor¶: Camera translation vector in world coordinates (from E^{-1}).

undistort_points(points: Tensor, iterations=5)¶: Iteratively undistort points using the non-fisheye model.

undistort_points_fisheye(points: Tensor) → Tensor¶

Fisheye undistort 2D points using PyTorch.

Parameters:: points (torch.Tensor) – Tensor of shape (*batch_shape, …, 2), 2D points.
Returns:: Undistorted 2D points.
Return type:: torch.Tensor

unproject(points: Tensor) → Tensor¶

Unproject 2D points from the image plane to 3D space in normalized homogeneous coordinates using PyTorch tensors.

Parameters:: points – A torch.Tensor of 2D points. The shape is (…, 2).
Returns:: Points in 3D space as normalized homogeneous coordinates.

property vfov: Tensor¶: Return the vertical field of view in degrees.

warpImage(image: Tensor, dest: Camera, border_mode='zeros') → Tensor¶

Apply perspective warp from this camera to the destination camera to a batch of images.

Parameters:

image (torch.Tensor) – Input image tensor.
dest (Camera) – Destination camera with no relative translation.
border_mode (str, optional) – Border mode for interpolation (‘constant’, ‘nearest’, ‘reflect’, or ‘wrap’).

Returns:

Warped image.

Return type:

torch.Tensor

warpPoints(points: Tensor, dest: Camera) → Tensor¶

Warp points from this camera’s view to the destination camera’s view.

Parameters:

points (torch.Tensor) – Tensor of shape (*batch_shape, …, 2), 2D points.
dest (Camera) – Destination camera with no relative translation.

Returns:

Warped 2D points in the destination camera’s view.

Return type:

torch.Tensor

zoom(new_fov: float, new_resolution: List[float])¶: Adjust the camera’s field of view using PyTorch. :param new_hfov: New horizontal field of view in degrees (optional). :param new_vfov: New vertical field of view in degrees (optional). :return: New Camera instance with updated intrinsic matrix.

s6.vision.camera.ensure_tensor(shape: List[int] | None = None, dtype: dtype = torch.float32)¶

Decorator to coerce the first non-self argument to a torch.Tensor.

Accepts NumPy arrays, Python lists, or scalars and moves them to the specified dtype. Optionally validates the exact shape. Intended for helper functions that may be called with heterogeneous input types.

Parameters:

shape (list[int] | None) – If provided, require the input tensor to match this shape exactly.
dtype (torch.dtype) – Target dtype for the coerced tensor (default: torch.float32).

s6.vision.camera.meshgrid(height, width) → Tensor¶

Create an (H, W, 2) grid of pixel centers for sampling operations.

The grid stores subpixel-centered coordinates (x+0.5, y+0.5) suitable for image sampling and warping.

Returns:: A tensor of shape (height, width, 2) containing the pixel coordinates.
Return type:: torch.Tensor

s6.vision.camera.transform_operator(input_dims, output_dims)¶

Decorator to broadcast point transforms over arbitrary batch shapes.

The wrapped function must accept a tensor of shape (N, input_dims) and return a tensor of shape (N, output_dims). This decorator allows callers to pass shapes like (..., input_dims). Inputs are flattened to 2D, passed to the wrapped function, then reshaped back to (..., output_dims).

Parameters:

input_dims (int) – Size of the trailing dimension expected by the wrapped function.
output_dims (int) – Size of the trailing dimension returned by the wrapped function.

s6.vision.detectors module¶

Classical image-space detectors used by the tracking pipeline.

This module contains OpenCV/Numpy-based routines for component detection, circle/rim localization, contour processing, and tip endpoint detection. Each function is designed to be fast and reasonably robust without requiring learned models. Many functions are annotated with Profiler.trace_function to integrate with the project’s lightweight performance tracing.

class s6.vision.detectors.MaskUtils¶

Bases: object

Utility class for generating and caching circular masks and erasing pixels beyond boundaries.

static det_mask(center: Vector2D, size: Tuple[int, int], radius: int) → ndarray¶: Return a binary mask of shape size with ones inside the circle defined by center and radius. Masks are cached per parameters.

static erase_beyond_boundary(image: ndarray, center: Vector2D, radius: float) → None¶: Zero out pixels outside the circular boundary defined by center and radius.

s6.vision.detectors.circle_model(params, x, y)¶

Circle model for fitting.

The model used is: sqrt((x - x0)**2 + (y - y0)**2) - r. Minimizing this function helps to find the circle parameters (x0, y0, r).

Parameters:

params (array-like) – Circle parameters [x0, y0, r].
x (np.ndarray) – X-coordinates of the data points.
y (np.ndarray) – Y-coordinates of the data points.

Returns:

Array of residuals for each data point.

Return type:

np.ndarray

s6.vision.detectors.detect_centroid(image)¶

Detects the centroid of the largest blob in the binary-thresholded image by fitting a circle to its contour.

Parameters:: image (np.ndarray) – Input image to be processed.
Returns:: Coordinates of the blob’s centroid.
Return type:: Vector2D

s6.vision.detectors.detect_components(image: ndarray, area_thresholds: Tuple[float, float] = (600, 5000), square_ratio: float = 1.2) → List[Vector2D]¶

Detect roughly circular (square‐ish) components in an image using connected components.

Parameters:

image (np.ndarray) – The input (grayscale or single‐channel) image.
area_thresholds (Tuple[float, float], optional) – (min, max) area thresholds. If both are in [0, 1], they’re taken as fractions of the total image area; otherwise as absolute pixel‐areas.
square_ratio (float, optional) – Maximum allowed width/height (or height/width) ratio. E.g. 1.2 means up to 20% rectangular distortion is allowed (default: 1.2).

Returns:

The centers of the fitted circles for all components passing filters.

Return type:

List[Vector2D]

s6.vision.detectors.detect_outer_rim(image, min_radius_ratio=0.4, dp=1.2, canny_thresh1=30, canny_thresh2=120, hough_param1=50, hough_param2=30, fallback_thresh=230, fallback_kernel=9)¶

Detect the outer bright rim in image, returning (x, y, r) only if the found radius r >= min_radius_ratio * image_height.

Tries Hough Circle first, then falls back to threshold/morphology.

Params:: image - BGR or grayscale cv2 image min_radius_ratio - min allowed circle radius as fraction of image height dp - inverse ratio for Hough accumulator canny_thresh1/2 - Canny edge thresholds hough_param1 - higher Canny threshold for Hough hough_param2 - accumulator threshold for Hough fallback_thresh - brightness threshold for fallback fallback_kernel - morphology kernel size for fallback

Returns:: (x, y, r) or None

s6.vision.detectors.detect_outer_rim_v2(image, min_radius_ratio=0.4, dp=1.2, canny_thresh1=30, canny_thresh2=120, hough_param1=50, hough_param2=30, downscale: int = 8)¶

Performance-oriented reimplementation of detect_outer_rim() with identical output semantics. It preserves the detection outcome while optimizing steps.

Returns (Vector2D(x, y), min_r) or None, matching v1.

s6.vision.detectors.detect_prong_tips_filtered(img: ndarray, mask: ndarray | None = None, suppression_radius: int = 10, top_k: int = 2, far_point: Tuple[float, float] | None = None) → List[Vector2D]¶

Detect instrument prong tip points in the image, then filter out points that are too close to each other and return the top_k points sorted by distance to the provided far_point (furthest first) or closest to the image center if far_point is None.

Parameters:

img (np.ndarray) – Grayscale image input.
mask (Optional[np.ndarray]) – Binary mask (1=active) specifying region to process. Detection runs only within masked region if provided.
suppression_radius (int) – Minimum distance (in pixels) between accepted points.
top_k (int) – Number of final tip points to return (sorted by distance to far_point if provided, otherwise closest to image center).
far_point (Optional[Tuple[float, float]]) – Reference (x, y) point for sorting. Endpoints are sorted by furthest distance to this point if provided.

Returns:

Filtered tip points.

Return type:

List[Vector2D]

s6.vision.detectors.detect_rim_and_estimate_tilt(image: ndarray, K: ndarray, base_Z_mm: float = 120.0, stick_l_mm: float = 8.0, radius_mm: float = 1.5, min_radius_ratio: float = 0.4, dp: float = 1.2, canny_thresh1: int = 30, canny_thresh2: int = 120, hough_param1: int = 50, hough_param2: int = 30, downscale: int = 8, edge_band_px: float = 20.0, max_edge_pts: int = 400) → Tuple[Tuple[Vector2D, float] | None, Tuple[float, float] | None]¶

Detect the outer rim and estimate tilt directly from an input image.

Parameters:

image (np.ndarray) – Grayscale or BGR image.
K (np.ndarray) – 3x3 camera intrinsic matrix (fx, fy, cx, cy in pixels).
base_Z_mm (float) – Physical setup constants for tilt estimation.
stick_l_mm (float) – Physical setup constants for tilt estimation.
radius_mm (float) – Physical setup constants for tilt estimation.
min_radius_ratio – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
dp – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
canny_thresh1 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
canny_thresh2 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
hough_param1 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
hough_param2 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
downscale – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
edge_band_px (float) – Pixel half-width of the annulus around the detected rim used to sample edge points.
max_edge_pts (int) – Cap on the number of edge points (uniform subsample if exceeded).

Returns:

boundary: (Vector2D center, float radius) or None if rim not found tilt: (phi_deg, theta_deg) or None if edge points insufficient or solve fails

Return type:

(boundary, tilt)

s6.vision.detectors.estimate_tilt_from_rim(edge_pts: ndarray, K: ndarray, base_Z_mm: float, stick_l_mm: float, radius_mm: float, refine_iters: int = 4) → Tuple[float, float]¶

Estimate plate tilt (phi, theta) from ring edge points.

Solves the tilt of a gimbal‑mounted plane (no roll about its normal) from a single observed ring by fitting an ellipse in the image and refining a physically‑based projection model.

Parameters:

edge_pts (ndarray) – Sampled ring edge points of shape (N, 2) in pixel coordinates.
K (ndarray) – Camera intrinsics of shape (3, 3) with fx, fy, cx, cy in pixels.
base_Z_mm (float) – Distance from camera origin to the gimbal pivot along camera +Z (mm).
stick_l_mm (float) – Offset from the pivot to the ring plane center along the plane normal (mm).
radius_mm (float) – Physical ring radius on the plane (mm).
refine_iters (int, optional) – Gauss–Newton refinement iterations, by default 4.

Returns:

(phi_deg, theta_deg) in degrees. phi is the tilt magnitude from camera +Z; theta is the azimuth direction of tilt in [0, 360).

Return type:

Tuple[float, float]

Notes

The observed ellipse is fitted as a conic \(Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0\) and represented by the symmetric matrix \(C_{\text{obs}}\).
Initialization uses ellipse axis ratio and angle: \(\phi_0 \approx \arccos(b/a)\), \(\theta_0 \approx \alpha \pm 90^\circ\), with \(a \ge b\) and \(\alpha\) the ellipse rotation.
Circle samples on the tilted plane are \(\mathbf{P}(t) = \mathbf{P}_0 + r\,(\cos t\,\mathbf{u} + \sin t\,\mathbf{v})\), where \(\mathbf{P}_0 = [0,0,\text{base}_Z]^\top + R[0,0,\text{stick}_\ell]^\top\). Pixels follow the pinhole model \(u = f_x X/Z + c_x,\; v = f_y Y/Z + c_y\).
We minimize the algebraic conic residual via a few Gauss–Newton steps:

\[L(\phi, \theta) = \frac{1}{N} \sum_i \left( \mathbf{x}_i^\top C_{\text{obs}}\, \mathbf{x}_i \right)^2,\]

with \(\mathbf{x}_i = [u_i, v_i, 1]^\top\) from projecting \(\mathbf{P}(t_i)\) using the no‑roll tilt rotation \(R\) that maps \(\mathbf{z}\) to \(\mathbf{d}=[\sin\phi\cos\theta,\,\sin\phi\sin\theta,\,\cos\phi]^\top\).

s6.vision.detectors.find_tip_points(image: ndarray, threshold: int = 120) → Tuple[Vector2D, Vector2D]¶

Identifies the two tip points of the largest contour in the image within a specified circle.

Parameters:

image (np.ndarray) – The input image for processing.
threshold (int, optional) – The threshold value for binary thresholding, by default 120.

Returns:

The two farthest tip points as Vector2D objects.

Return type:

Tuple[Vector2D, Vector2D]

s6.vision.detectors.fit_circle_to_component(image: ndarray, stat: ComponentStat) → Vector2D¶

Fits a circle to a given component in an image.

This function extracts a component from the provided image using the given component statistics. It then checks intensity conditions, resizes the extracted component, finds contours, and computes a minimum enclosing circle of the largest contour. The center of this circle is adjusted based on the original position of the component in the image.

Parameters:

image (np.ndarray) – The input image from which the component will be extracted.
stat (ComponentStat) – A dataclass containing the x, y coordinates, width, and height of the component.

Returns:

A dataclass containing the x, y coordinates of the circle’s center if a circle is successfully fitted; None otherwise.

Return type:

Vector2D or None

s6.vision.detectors.fit_circle_to_contour(contour)¶

Fits a circle to the provided contour using least squares optimization.

Parameters:: contour (np.ndarray) – Contour points in the format (N, 1, 2).
Returns:: The optimized circle center (x, y) and radius.
Return type:: Tuple[float, float, float]

s6.vision.detectors.fit_polynomial_to_contour(contour, degree=3)¶

Fits a polynomial curve to a contour to generate a smoothed 2D curve.

Parameters:

contour (np.ndarray) – The input contour of shape (N, 1, 2).
degree (int, optional) – Degree of the polynomial used for fitting, by default 3.

Returns:

The smoothed contour points of shape (N, 1, 2).

Return type:

np.ndarray

s6.vision.drawing module¶

Drawing helpers and lightweight overlay API for OpenCV images.

This module provides a stateful, but convenient, interface for annotating images produced by the pipeline. The central Markers class exposes static methods to draw points, lines, shapes, text, and simple 3D wireframes onto NumPy image arrays. Two concepts make usage ergonomic:

Target context: Markers.target(image) sets a default image so calls can omit the image argument inside the context manager.
Buffered rendering: Markers.render_scope() batches drawing calls and flushes them in order on exit. This can reduce flicker and keep draw order clear when many annotations are produced per frame.

All colors are BGR tuples as used by OpenCV. Coordinates use the same pixel convention as the rest of the project: Vector2D(x, y) for image space and Vector3D(x, y, z) for world/camera space.

class s6.vision.drawing.Markers(*args, **kwargs)¶

Bases: object

Convenience API for drawing annotations on images.

The class is effectively a singleton and maintains global state for: - enable/disable switch for all drawing operations; - a stack of default target images (managed by target()); - an optional command buffer (managed by render_scope() and render()).

Public methods accept an optional image parameter; when omitted, the current target image from Markers.target(…) is used. Most methods return the image they drew on for convenience/chaining.

classmethod arrow(image: ndarray = None, start: Vector2D = None, end: Vector2D = None, thickness: int = 1, color=(255, 255, 255))¶: Draw an arrow from start to end with ~45° head.

classmethod box(image: ndarray = None, center: Vector2D = None, size: float = 15, width: float = None, height: float = None, color=(255, 255, 255), thickness: int = 1)¶

Draw a rectangle centered at a point.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
center (Vector2D) – Center of the rectangle.
size (float, optional) – Half-side of a square. Ignored if width and height provided.
width (float | None) – Full dimensions of the rectangle in pixels. Both must be provided to override size.
height (float | None) – Full dimensions of the rectangle in pixels. Both must be provided to override size.
color (tuple, optional) – BGR color, by default white.
thickness (int, optional) – Line thickness, by default 1.

classmethod circle(image: ndarray = None, point: Vector2D = None, radius: float = 5, color=(255, 255, 0), thickness: int = 1, filled: bool = True)¶

Draw a filled or outlined circle.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
point (Vector2D) – Circle center.
radius (float, optional) – Radius in pixels, by default 5.
color (tuple, optional) – BGR color, by default yellow.
thickness (int, optional) – Outline thickness if filled=False.
filled (bool, optional) – If True, draw a filled disk; otherwise outline only.

classmethod cross(image: ndarray = None, point: Vector2D = None, color=None, length: int = 15, thickness: int = 1)¶

Draw a diagonal X-shaped cross at a point.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
point (Vector2D) – Center of the X.
color (tuple | None) – BGR color (default derives from image if None).
length (int, optional) – Arm length in pixels, by default 15.
thickness (int, optional) – Line thickness, by default 1.

classmethod crosshair(image: ndarray = None, center: Vector2D = None, r: float = 1000, color: tuple = (255, 255, 255))¶

Draw a crosshair centered on a point.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
center (Vector2D) – Center of the crosshair.
r (float, optional) – Half-length of cross arms in pixels, by default 1000.
color (tuple, optional) – BGR color, by default white.

static disable()¶

Disable all drawing operations globally.

Returns:: Always returns False for convenience.
Return type:: bool

classmethod dot(image: ndarray = None, point: Vector2D = None, color=None)¶

Draw a simple dot marker at a point.

Uses the current Markers.target image if image is None.

static enable()¶

Enable all drawing operations globally.

Returns:: Always returns True for convenience.
Return type:: bool

classmethod guide(image: ndarray = None, point: Vector2D = None, color=(255, 255, 255), text: str = None)¶

Draw a guide arrow from image center toward an external point.

The arrow is clipped at the image boundary with a 5% margin and starts at an inner radius of 200 px from the center. Optional text is drawn near the start.

classmethod icosphere_3d(image: ndarray = None, camera: Camera = None, point: Vector3D = None, radius: float = None, subdivision: int = 2, color=(255, 255, 255), thickness: int = 1)¶

Draw a wireframe icosphere projected into the image.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
camera (Camera) – Camera used to project the mesh.
point (Vector3D) – Center of the sphere in world coordinates.
radius (float) – Sphere radius in world units.
subdivision (int, optional) – Mesh subdivision level, by default 2.
color (tuple, optional) – Line color, by default white.
thickness (int, optional) – Line thickness in pixels.

classmethod line(image: ndarray = None, start: Vector2D = None, end: Vector2D = None, thickness: int = 1, color=(255, 255, 255))¶: Draw a line segment between two points.

classmethod render()¶: Execute all buffered drawing operations.

classmethod render_scope()¶: Context manager that buffers operations until exit, then renders.

classmethod target(tgt: ndarray)¶

Context manager to set a default target image for drawing operations.

Within this context, drawing calls without an explicit image will draw on tgt.

classmethod text(image: ndarray = None, text: str = None, location: Vector2D = None, font=0, fontScale: int = 1, color=(255, 255, 255), thickness: int = 2)¶

Draw a text string at a location.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
text (str) – Text content to draw.
location (Vector2D) – Bottom-left anchor of the text.
font (int, optional) – OpenCV font identifier.
fontScale (int, optional) – Scale factor applied to the font base size.
color (tuple, optional) – BGR color, defaults to white.
thickness (int, optional) – Stroke thickness in pixels.

classmethod vector_text(image: ndarray = None, prefix: str = '', vector=None, location: Vector2D = None, offset: Vector2D = None, font=0, fontScale: int = 1, thickness: int = 2, colors=None, line_height: int = 25, scale: float = 1.0, fmt: str = '{:1.4f}')¶

Draw components of a vector as stacked text lines.

Parameters:

image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
prefix (str, optional) – Optional text prefix for each line (e.g., “B.”).
vector (Vector2D | Vector3D) – Vector to display.
location (Vector2D) – Bottom-left anchor for the first line.
offset (Vector2D | None) – Additional offset applied to location.
font (int) – cv2.putText parameters.
fontScale (int) – cv2.putText parameters.
thickness (int) – cv2.putText parameters.
colors (sequence[tuple] | None) – Per-component BGR colors; defaults to R/G/B mapping.
line_height (int, optional) – Vertical pixels between lines, by default 25.
scale (float, optional) – Scalar applied to component values, useful for unit conversion.
fmt (str, optional) – Python format string used for numbers.

Returns:

The image the text was drawn on.

Return type:

np.ndarray

s6.vision.solver module¶

Minimal geometric solvers for triangulation and tip estimation.

Provides pure-geometry routines that operate on Camera and simple vector primitives (Vector2D, Vector3D). Implementations rely on PyTorch to leverage the camera utilities and device/dtype management.

class s6.vision.solver.Solver¶

Bases: object

classmethod project_search_region(camera: Camera, center_world: Vector3D, radius_m: float) → Tuple[Vector2D, float, float] | None¶

Project a constant-size 3D spherical search region to the image.

Given a world-space center and a 3D radius (in meters), computes the 2D pixel center and the local pixel half-extents along image x and y by projecting small offsets in the camera frame using the camera’s projection model (including distortion when enabled).

Parameters:

camera (Camera) – Calibrated camera instance.
center_world (Vector3D) – Region center in world coordinates.
radius_m (float) – Search sphere radius in meters.

Returns:

Tuple of (center_px, r_px_x, r_px_y). Returns None if the projection fails or depth is invalid.

Return type:

(Vector2D, float, float) | None

classmethod solve_tip_point(camera: Camera, point: Vector2D, endpoint: Vector3D, length: float) → Vector3D¶

Intersects a camera ray with a sphere to estimate an instrument tip.

Given a 2D detection point in camera and a known endpoint of an instrument in 3D along with the instrument length, the method computes the intersection(s) of the camera ray with the sphere centered at endpoint with radius length. It selects the nearest valid intersection in front of the camera.

Parameters:

camera (Camera) – Calibrated camera used to form the viewing ray.
point (Vector2D) – Pixel coordinate of the tip projection.
endpoint (Vector3D) – Known 3D endpoint of the instrument (world frame).
length (float) – Distance from the endpoint to the tip.

Returns:

Estimated 3D tip position. Returns a zero vector if no valid intersection lies in front of the camera.

Return type:

Vector3D

classmethod triangulate(cam_0: Camera, cam_1: Camera, point_0: Vector2D, point_1: Vector2D) → Vector3D¶

Triangulate a 3D point from two calibrated camera observations.

The method forms two 3D rays from camera centers through the observed pixel coordinates by unprojecting, transforms them into world space, and computes the closest points between the two skew lines. The mid- point of the shortest segment connecting the rays is returned.

Parameters:

cam_0 (Camera) – Calibrated cameras providing intrinsics and extrinsics.
cam_1 (Camera) – Calibrated cameras providing intrinsics and extrinsics.
point_0 (Vector2D) – Pixel observations in the respective image planes.
point_1 (Vector2D) – Pixel observations in the respective image planes.

Returns:

Estimated 3D point in world coordinates. If the rays are nearly parallel, a zero vector is returned as a conservative fallback.

Return type:

Vector3D

s6.vision.test_camera module¶

class s6.vision.test_camera.TestCamera(methodName='runTest')¶

Bases: TestCase

test_calculate_intrinsic_matrix_focal_length()¶

test_calculate_intrinsic_matrix_fov()¶

test_create_intrinsic_matrix()¶

test_distort_fisheye_not_enabled()¶

test_distort_undistort_points_no_distortion()¶

test_from_homogeneous_error()¶

test_meshgrid()¶

test_normalize_denormalize()¶

test_project_and_unproject_identity()¶

test_resize()¶

test_rotation_matrix_from_axis_angle()¶

test_to_dict_and_from_dict()¶

test_to_from_homogeneous()¶

test_to_homogeneous_error()¶

test_transform_and_transform_inv_identity()¶

test_zoom()¶

s6.vision.test_detectors module¶

class s6.vision.test_detectors.TestDetectComponents(methodName='runTest')¶

Bases: TestCase

setUp()¶: Hook method for setting up the test fixture before exercising it.

test_absolute_threshold_exclusion()¶

test_absolute_threshold_inclusion()¶

test_default_threshold()¶

test_fraction_threshold_exclusion()¶

test_fraction_threshold_inclusion()¶

class s6.vision.test_detectors.TestMaskUtils(methodName='runTest')¶

Bases: TestCase

test_det_mask()¶

test_erase_beyond_boundary_color()¶

test_erase_beyond_boundary_grayscale()¶

s6.vision.test_detectors.create_circle_image(shape=(200, 200), center=(100, 100), radius=15)¶: Create a grayscale image with a single filled white circle on a black background.

s6.vision.test_drawing module¶

class s6.vision.test_drawing.TestMarkers(methodName='runTest')¶

Bases: TestCase

setUp()¶: Hook method for setting up the test fixture before exercising it.

test_box_wh_immediate()¶

test_box_wh_missing_dim()¶

test_buffering_and_render()¶

test_buffering_and_render_line_arrow()¶

test_disable_clears_buffer()¶

test_immediate_arrow()¶

test_immediate_circle()¶

test_immediate_line()¶

test_render_scope_context_manager()¶

s6.vision.tracking module¶

s6.vision.tracking.load_calibrations() → Tuple[Dict[str, Camera], ndarray]¶: Load camera calibration parameters from JSON using Pydantic models.

s6.vision.tracking.load_cameras() → Dict[str, Camera]¶: Load camera calibration parameters from JSON using Pydantic models.

s6.vision.trajectory module¶

Kinematic trajectory model for 3D tracking.

Maintains a bounded backlog of state samples and provides simple velocity and acceleration estimates, along with constant‑acceleration prediction of the next state. Timestamps are floats (seconds) from a monotonic or epoch clock; choose a consistent source for best results.

Key concepts¶

Frames store a location and the latest velocity estimate.
Velocity is computed from the last two locations using finite difference.
Acceleration is computed from two consecutive velocity estimates whose times are associated with segment midpoints.
Next state prediction uses constant‑acceleration kinematics.

class s6.vision.trajectory.TrackingFrame(*, location: Vector3D, velocity: Vector3D, timestamp: float)¶

Bases: BaseModel

Single tracking state sample.

location¶

3D position in world coordinates.

Type:: s6.schema.primitives.Vector3D

velocity¶

3D velocity estimate in world coordinates.

Type:: s6.schema.primitives.Vector3D

timestamp¶

Capture time in seconds (monotonic or epoch).

Type:: float

location: Vector3D¶

timestamp: float¶

velocity: Vector3D¶

class s6.vision.trajectory.Trajectory(*, frames: List[TrackingFrame] = None, maxlen: int = 20)¶

Bases: BaseModel

Fixed‑size trajectory buffer with simple kinematics.

Holds recent TrackingFrame samples up to maxlen and provides velocity/acceleration estimates and constant‑acceleration prediction of the next state.

add(location: Vector3D, timestamp: float) → None¶

Append an observation and update the velocity estimate.

Parameters:

location (s6.schema.primitives.Vector3D) – Observed 3D location in world coordinates.
timestamp (float) – Observation time in seconds.

Notes

The latest velocity is computed by finite difference of the last two locations. When the trajectory is empty, velocity is initialised to zero. A minimum dt of 1e-6 is enforced to avoid division by zero.

estimate_acceleration() → Vector3D | None¶

Estimate instantaneous acceleration from recent samples.

Uses two consecutive velocity estimates whose times are associated with segment midpoints: v_{n-1} between (n-2, n-1) and v_n between (n-1, n). Acceleration is the finite difference dv/dt between the velocity midpoints.

Returns:: Acceleration estimate if at least three samples exist, otherwise None.
Return type:: Vector3D | None

estimate_velocity() → Vector3D | None¶

Return the latest velocity estimate.

Returns:: Latest velocity if at least two samples exist, otherwise None.
Return type:: Vector3D | None

frames: List[TrackingFrame]¶

property last_location: Vector3D | None¶: Location of the most recent sample, or None if empty.

property last_timestamp: float | None¶: Timestamp of the most recent sample, or None if empty.

maxlen: int¶

predict_next(dt: float | None = None, timestamp: float | None = None) → TrackingFrame | None¶

Predict the next state under constant acceleration.

Parameters:

dt (float, optional) – Time step in seconds to advance from the last sample. If omitted, the last observed step duration is used.
timestamp (float, optional) – Absolute target timestamp in seconds. When provided, dt is inferred as timestamp - last_timestamp.

Returns:

Predicted sample at the next time step, or None if no samples are available.

Return type:

TrackingFrame | None

Notes

Constant‑acceleration kinematics are applied:

x_next = x + v*dt + 0.5*a*dt^2
v_next = v + a*dt

A minimum step dt >= 1e-6 is enforced when computing from timestamps. When only a single sample exists, velocity is assumed zero.

s6.vision package¶

Submodules¶

s6.vision.camera module¶

Key concepts¶

s6.vision.detectors module¶

s6.vision.drawing module¶

s6.vision.solver module¶

s6.vision.test_camera module¶

s6.vision.test_detectors module¶

s6.vision.test_drawing module¶

s6.vision.tracking module¶

s6.vision.trajectory module¶

Key concepts¶

Module contents¶