s6.vision package¶
Submodules¶
s6.vision.camera module¶
s6.vision.detectors module¶
Classical image-space detectors used by the tracking pipeline.
This module contains OpenCV/Numpy-based routines for component detection,
circle/rim localization, contour processing, and tip endpoint detection.
Each function is designed to be fast and reasonably robust without requiring
learned models. Many functions are annotated with Profiler.trace_function
to integrate with the project’s lightweight performance tracing.
- class s6.vision.detectors.MaskUtils¶
Bases:
objectUtility class for generating and caching circular masks and erasing pixels beyond boundaries.
- s6.vision.detectors.circle_model(params, x, y)¶
Circle model for fitting.
The model used is: sqrt((x - x0)**2 + (y - y0)**2) - r. Minimizing this function helps to find the circle parameters (x0, y0, r).
- Parameters:
params (array-like) – Circle parameters [x0, y0, r].
x (np.ndarray) – X-coordinates of the data points.
y (np.ndarray) – Y-coordinates of the data points.
- Returns:
Array of residuals for each data point.
- Return type:
np.ndarray
- s6.vision.detectors.detect_centroid(image)¶
Detects the centroid of the largest blob in the binary-thresholded image by fitting a circle to its contour.
- Parameters:
image (np.ndarray) – Input image to be processed.
- Returns:
Coordinates of the blob’s centroid.
- Return type:
- s6.vision.detectors.detect_components(image: ndarray, area_thresholds: Tuple[float, float] = (600, 5000), square_ratio: float = 1.2) List[Vector2D]¶
Detect roughly circular (square‐ish) components in an image using connected components.
- Parameters:
image (np.ndarray) – The input (grayscale or single‐channel) image.
area_thresholds (Tuple[float, float], optional) – (min, max) area thresholds. If both are in [0, 1], they’re taken as fractions of the total image area; otherwise as absolute pixel‐areas.
square_ratio (float, optional) – Maximum allowed width/height (or height/width) ratio. E.g. 1.2 means up to 20% rectangular distortion is allowed (default: 1.2).
- Returns:
The centers of the fitted circles for all components passing filters.
- Return type:
List[Vector2D]
- s6.vision.detectors.detect_outer_rim(image, min_radius_ratio=0.4, dp=1.2, canny_thresh1=30, canny_thresh2=120, hough_param1=50, hough_param2=30, fallback_thresh=230, fallback_kernel=9)¶
Detect the outer bright rim in image, returning (x, y, r) only if the found radius r >= min_radius_ratio * image_height.
Tries Hough Circle first, then falls back to threshold/morphology.
- Params:
image - BGR or grayscale cv2 image min_radius_ratio - min allowed circle radius as fraction of image height dp - inverse ratio for Hough accumulator canny_thresh1/2 - Canny edge thresholds hough_param1 - higher Canny threshold for Hough hough_param2 - accumulator threshold for Hough fallback_thresh - brightness threshold for fallback fallback_kernel - morphology kernel size for fallback
- Returns:
(x, y, r) or None
- s6.vision.detectors.detect_outer_rim_v2(image, min_radius_ratio=0.4, dp=1.2, canny_thresh1=30, canny_thresh2=120, hough_param1=50, hough_param2=30, downscale: int = 8)¶
Performance-oriented reimplementation of detect_outer_rim() with identical output semantics. It preserves the detection outcome while optimizing steps.
Returns (Vector2D(x, y), min_r) or None, matching v1.
- s6.vision.detectors.detect_prong_tips_filtered(img: ndarray, mask: ndarray | None = None, suppression_radius: int = 10, top_k: int = 2, far_point: Tuple[float, float] | None = None) List[Vector2D]¶
Detect instrument prong tip points in the image, then filter out points that are too close to each other and return the top_k points sorted by distance to the provided far_point (furthest first) or closest to the image center if far_point is None.
- Parameters:
img (np.ndarray) – Grayscale image input.
mask (Optional[np.ndarray]) – Binary mask (1=active) specifying region to process. Detection runs only within masked region if provided.
suppression_radius (int) – Minimum distance (in pixels) between accepted points.
top_k (int) – Number of final tip points to return (sorted by distance to far_point if provided, otherwise closest to image center).
far_point (Optional[Tuple[float, float]]) – Reference (x, y) point for sorting. Endpoints are sorted by furthest distance to this point if provided.
- Returns:
Filtered tip points.
- Return type:
List[Vector2D]
- s6.vision.detectors.detect_rim_and_estimate_tilt(image: ndarray, K: ndarray, base_Z_mm: float = 120.0, stick_l_mm: float = 8.0, radius_mm: float = 1.5, min_radius_ratio: float = 0.4, dp: float = 1.2, canny_thresh1: int = 30, canny_thresh2: int = 120, hough_param1: int = 50, hough_param2: int = 30, downscale: int = 8, edge_band_px: float = 20.0, max_edge_pts: int = 400) Tuple[Tuple[Vector2D, float] | None, Tuple[float, float] | None]¶
Detect the outer rim and estimate tilt directly from an input image.
- Parameters:
image (np.ndarray) – Grayscale or BGR image.
K (np.ndarray) – 3x3 camera intrinsic matrix (fx, fy, cx, cy in pixels).
base_Z_mm (float) – Physical setup constants for tilt estimation.
stick_l_mm (float) – Physical setup constants for tilt estimation.
radius_mm (float) – Physical setup constants for tilt estimation.
min_radius_ratio – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
dp – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
canny_thresh1 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
canny_thresh2 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
hough_param1 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
hough_param2 – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
downscale – Parameters forwarded to detect_outer_rim_v2; defaults mirror tuned values.
edge_band_px (float) – Pixel half-width of the annulus around the detected rim used to sample edge points.
max_edge_pts (int) – Cap on the number of edge points (uniform subsample if exceeded).
- Returns:
boundary: (Vector2D center, float radius) or None if rim not found tilt: (phi_deg, theta_deg) or None if edge points insufficient or solve fails
- Return type:
(boundary, tilt)
- s6.vision.detectors.estimate_tilt_from_rim(edge_pts: ndarray, K: ndarray, base_Z_mm: float, stick_l_mm: float, radius_mm: float, refine_iters: int = 4) Tuple[float, float]¶
Estimate plate tilt (phi, theta) from ring edge points.
Solves the tilt of a gimbal‑mounted plane (no roll about its normal) from a single observed ring by fitting an ellipse in the image and refining a physically‑based projection model.
- Parameters:
edge_pts (ndarray) – Sampled ring edge points of shape
(N, 2)in pixel coordinates.K (ndarray) – Camera intrinsics of shape
(3, 3)withfx, fy, cx, cyin pixels.base_Z_mm (float) – Distance from camera origin to the gimbal pivot along camera
+Z(mm).stick_l_mm (float) – Offset from the pivot to the ring plane center along the plane normal (mm).
radius_mm (float) – Physical ring radius on the plane (mm).
refine_iters (int, optional) – Gauss–Newton refinement iterations, by default 4.
- Returns:
(phi_deg, theta_deg)in degrees.phiis the tilt magnitude from camera+Z;thetais the azimuth direction of tilt in[0, 360).- Return type:
Tuple[float, float]
Notes
The observed ellipse is fitted as a conic \(Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0\) and represented by the symmetric matrix \(C_{\text{obs}}\).
Initialization uses ellipse axis ratio and angle: \(\phi_0 \approx \arccos(b/a)\), \(\theta_0 \approx \alpha \pm 90^\circ\), with \(a \ge b\) and \(\alpha\) the ellipse rotation.
Circle samples on the tilted plane are \(\mathbf{P}(t) = \mathbf{P}_0 + r\,(\cos t\,\mathbf{u} + \sin t\,\mathbf{v})\), where \(\mathbf{P}_0 = [0,0,\text{base}_Z]^\top + R[0,0,\text{stick}_\ell]^\top\). Pixels follow the pinhole model \(u = f_x X/Z + c_x,\; v = f_y Y/Z + c_y\).
We minimize the algebraic conic residual via a few Gauss–Newton steps:
\[L(\phi, \theta) = \frac{1}{N} \sum_i \left( \mathbf{x}_i^\top C_{\text{obs}}\, \mathbf{x}_i \right)^2,\]with \(\mathbf{x}_i = [u_i, v_i, 1]^\top\) from projecting \(\mathbf{P}(t_i)\) using the no‑roll tilt rotation \(R\) that maps \(\mathbf{z}\) to \(\mathbf{d}=[\sin\phi\cos\theta,\,\sin\phi\sin\theta,\,\cos\phi]^\top\).
- s6.vision.detectors.find_tip_points(image: ndarray, threshold: int = 120) Tuple[Vector2D, Vector2D]¶
Identifies the two tip points of the largest contour in the image within a specified circle.
- s6.vision.detectors.fit_circle_to_component(image: ndarray, stat: ComponentStat) Vector2D¶
Fits a circle to a given component in an image.
This function extracts a component from the provided image using the given component statistics. It then checks intensity conditions, resizes the extracted component, finds contours, and computes a minimum enclosing circle of the largest contour. The center of this circle is adjusted based on the original position of the component in the image.
- Parameters:
image (np.ndarray) – The input image from which the component will be extracted.
stat (ComponentStat) – A dataclass containing the x, y coordinates, width, and height of the component.
- Returns:
A dataclass containing the x, y coordinates of the circle’s center if a circle is successfully fitted; None otherwise.
- Return type:
Vector2D or None
- s6.vision.detectors.fit_circle_to_contour(contour)¶
Fits a circle to the provided contour using least squares optimization.
- Parameters:
contour (np.ndarray) – Contour points in the format (N, 1, 2).
- Returns:
The optimized circle center (x, y) and radius.
- Return type:
Tuple[float, float, float]
- s6.vision.detectors.fit_polynomial_to_contour(contour, degree=3)¶
Fits a polynomial curve to a contour to generate a smoothed 2D curve.
- Parameters:
contour (np.ndarray) – The input contour of shape (N, 1, 2).
degree (int, optional) – Degree of the polynomial used for fitting, by default 3.
- Returns:
The smoothed contour points of shape (N, 1, 2).
- Return type:
np.ndarray
s6.vision.drawing module¶
Drawing helpers and lightweight overlay API for OpenCV images.
This module provides a stateful, but convenient, interface for annotating images produced by the pipeline. The central Markers class exposes static methods to draw points, lines, shapes, text, and simple 3D wireframes onto NumPy image arrays. Two concepts make usage ergonomic:
Target context: Markers.target(image) sets a default image so calls can omit the image argument inside the context manager.
Buffered rendering: Markers.render_scope() batches drawing calls and flushes them in order on exit. This can reduce flicker and keep draw order clear when many annotations are produced per frame.
All colors are BGR tuples as used by OpenCV. Coordinates use the same pixel convention as the rest of the project: Vector2D(x, y) for image space and Vector3D(x, y, z) for world/camera space.
- class s6.vision.drawing.Markers(*args, **kwargs)¶
Bases:
objectConvenience API for drawing annotations on images.
The class is effectively a singleton and maintains global state for: - enable/disable switch for all drawing operations; - a stack of default target images (managed by target()); - an optional command buffer (managed by render_scope() and render()).
Public methods accept an optional image parameter; when omitted, the current target image from Markers.target(…) is used. Most methods return the image they drew on for convenience/chaining.
- classmethod arrow(image: ndarray = None, start: Vector2D = None, end: Vector2D = None, thickness: int = 1, color=(255, 255, 255))¶
Draw an arrow from start to end with ~45° head.
- classmethod box(image: ndarray = None, center: Vector2D = None, size: float = 15, width: float = None, height: float = None, color=(255, 255, 255), thickness: int = 1)¶
Draw a rectangle centered at a point.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
center (Vector2D) – Center of the rectangle.
size (float, optional) – Half-side of a square. Ignored if width and height provided.
width (float | None) – Full dimensions of the rectangle in pixels. Both must be provided to override size.
height (float | None) – Full dimensions of the rectangle in pixels. Both must be provided to override size.
color (tuple, optional) – BGR color, by default white.
thickness (int, optional) – Line thickness, by default 1.
- classmethod circle(image: ndarray = None, point: Vector2D = None, radius: float = 5, color=(255, 255, 0), thickness: int = 1, filled: bool = True)¶
Draw a filled or outlined circle.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
point (Vector2D) – Circle center.
radius (float, optional) – Radius in pixels, by default 5.
color (tuple, optional) – BGR color, by default yellow.
thickness (int, optional) – Outline thickness if filled=False.
filled (bool, optional) – If True, draw a filled disk; otherwise outline only.
- classmethod cross(image: ndarray = None, point: Vector2D = None, color=None, length: int = 15, thickness: int = 1)¶
Draw a diagonal X-shaped cross at a point.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
point (Vector2D) – Center of the X.
color (tuple | None) – BGR color (default derives from image if None).
length (int, optional) – Arm length in pixels, by default 15.
thickness (int, optional) – Line thickness, by default 1.
- classmethod crosshair(image: ndarray = None, center: Vector2D = None, r: float = 1000, color: tuple = (255, 255, 255))¶
Draw a crosshair centered on a point.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
center (Vector2D) – Center of the crosshair.
r (float, optional) – Half-length of cross arms in pixels, by default 1000.
color (tuple, optional) – BGR color, by default white.
- static disable()¶
Disable all drawing operations globally.
- Returns:
Always returns False for convenience.
- Return type:
bool
- classmethod dot(image: ndarray = None, point: Vector2D = None, color=None)¶
Draw a simple dot marker at a point.
Uses the current Markers.target image if image is None.
- static enable()¶
Enable all drawing operations globally.
- Returns:
Always returns True for convenience.
- Return type:
bool
- classmethod guide(image: ndarray = None, point: Vector2D = None, color=(255, 255, 255), text: str = None)¶
Draw a guide arrow from image center toward an external point.
The arrow is clipped at the image boundary with a 5% margin and starts at an inner radius of 200 px from the center. Optional text is drawn near the start.
- classmethod icosphere_3d(image: ndarray = None, camera: Camera = None, point: Vector3D = None, radius: float = None, subdivision: int = 2, color=(255, 255, 255), thickness: int = 1)¶
Draw a wireframe icosphere projected into the image.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
camera (Camera) – Camera used to project the mesh.
point (Vector3D) – Center of the sphere in world coordinates.
radius (float) – Sphere radius in world units.
subdivision (int, optional) – Mesh subdivision level, by default 2.
color (tuple, optional) – Line color, by default white.
thickness (int, optional) – Line thickness in pixels.
- classmethod line(image: ndarray = None, start: Vector2D = None, end: Vector2D = None, thickness: int = 1, color=(255, 255, 255))¶
Draw a line segment between two points.
- classmethod render()¶
Execute all buffered drawing operations.
- classmethod render_scope()¶
Context manager that buffers operations until exit, then renders.
- classmethod target(tgt: ndarray)¶
Context manager to set a default target image for drawing operations.
Within this context, drawing calls without an explicit image will draw on tgt.
- classmethod text(image: ndarray = None, text: str = None, location: Vector2D = None, font=0, fontScale: int = 1, color=(255, 255, 255), thickness: int = 2)¶
Draw a text string at a location.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
text (str) – Text content to draw.
location (Vector2D) – Bottom-left anchor of the text.
font (int, optional) – OpenCV font identifier.
fontScale (int, optional) – Scale factor applied to the font base size.
color (tuple, optional) – BGR color, defaults to white.
thickness (int, optional) – Stroke thickness in pixels.
- classmethod vector_text(image: ndarray = None, prefix: str = '', vector=None, location: Vector2D = None, offset: Vector2D = None, font=0, fontScale: int = 1, thickness: int = 2, colors=None, line_height: int = 25, scale: float = 1.0, fmt: str = '{:1.4f}')¶
Draw components of a vector as stacked text lines.
- Parameters:
image (np.ndarray | None) – Target image; if None, uses the current Markers.target image.
prefix (str, optional) – Optional text prefix for each line (e.g., “B.”).
location (Vector2D) – Bottom-left anchor for the first line.
offset (Vector2D | None) – Additional offset applied to location.
font (int) – cv2.putText parameters.
fontScale (int) – cv2.putText parameters.
thickness (int) – cv2.putText parameters.
colors (sequence[tuple] | None) – Per-component BGR colors; defaults to R/G/B mapping.
line_height (int, optional) – Vertical pixels between lines, by default 25.
scale (float, optional) – Scalar applied to component values, useful for unit conversion.
fmt (str, optional) – Python format string used for numbers.
- Returns:
The image the text was drawn on.
- Return type:
np.ndarray
s6.vision.solver module¶
Minimal geometric solvers for triangulation and tip estimation.
Provides pure-geometry routines that operate on Camera
and simple vector primitives (Vector2D,
Vector3D). Implementations rely on PyTorch
to leverage the camera utilities and device/dtype management.
- class s6.vision.solver.Solver¶
Bases:
object- classmethod project_search_region(camera: Camera, center_world: Vector3D, radius_m: float) Tuple[Vector2D, float, float] | None¶
Project a constant-size 3D spherical search region to the image.
Given a world-space center and a 3D radius (in meters), computes the 2D pixel center and the local pixel half-extents along image x and y by projecting small offsets in the camera frame using the camera’s projection model (including distortion when enabled).
- Parameters:
- Returns:
Tuple of (center_px, r_px_x, r_px_y). Returns None if the projection fails or depth is invalid.
- Return type:
(Vector2D, float, float) | None
- classmethod solve_tip_point(camera: Camera, point: Vector2D, endpoint: Vector3D, length: float) Vector3D¶
Intersects a camera ray with a sphere to estimate an instrument tip.
Given a 2D detection
pointincameraand a known endpoint of an instrument in 3D along with the instrumentlength, the method computes the intersection(s) of the camera ray with the sphere centered atendpointwith radiuslength. It selects the nearest valid intersection in front of the camera.- Parameters:
- Returns:
Estimated 3D tip position. Returns a zero vector if no valid intersection lies in front of the camera.
- Return type:
- classmethod triangulate(cam_0: Camera, cam_1: Camera, point_0: Vector2D, point_1: Vector2D) Vector3D¶
Triangulate a 3D point from two calibrated camera observations.
The method forms two 3D rays from camera centers through the observed pixel coordinates by unprojecting, transforms them into world space, and computes the closest points between the two skew lines. The mid- point of the shortest segment connecting the rays is returned.
- Parameters:
- Returns:
Estimated 3D point in world coordinates. If the rays are nearly parallel, a zero vector is returned as a conservative fallback.
- Return type:
s6.vision.test_camera module¶
- class s6.vision.test_camera.TestCamera(methodName='runTest')¶
Bases:
TestCase- test_calculate_intrinsic_matrix_focal_length()¶
- test_calculate_intrinsic_matrix_fov()¶
- test_create_intrinsic_matrix()¶
- test_distort_fisheye_not_enabled()¶
- test_distort_undistort_points_no_distortion()¶
- test_from_homogeneous_error()¶
- test_meshgrid()¶
- test_normalize_denormalize()¶
- test_project_and_unproject_identity()¶
- test_resize()¶
- test_rotation_matrix_from_axis_angle()¶
- test_to_dict_and_from_dict()¶
- test_to_from_homogeneous()¶
- test_to_homogeneous_error()¶
- test_transform_and_transform_inv_identity()¶
- test_zoom()¶
s6.vision.test_detectors module¶
- class s6.vision.test_detectors.TestDetectComponents(methodName='runTest')¶
Bases:
TestCase- setUp()¶
Hook method for setting up the test fixture before exercising it.
- test_absolute_threshold_exclusion()¶
- test_absolute_threshold_inclusion()¶
- test_default_threshold()¶
- test_fraction_threshold_exclusion()¶
- test_fraction_threshold_inclusion()¶
- class s6.vision.test_detectors.TestMaskUtils(methodName='runTest')¶
Bases:
TestCase- test_det_mask()¶
- test_erase_beyond_boundary_color()¶
- test_erase_beyond_boundary_grayscale()¶
- s6.vision.test_detectors.create_circle_image(shape=(200, 200), center=(100, 100), radius=15)¶
Create a grayscale image with a single filled white circle on a black background.
s6.vision.test_drawing module¶
- class s6.vision.test_drawing.TestMarkers(methodName='runTest')¶
Bases:
TestCase- setUp()¶
Hook method for setting up the test fixture before exercising it.
- test_box_wh_immediate()¶
- test_box_wh_missing_dim()¶
- test_buffering_and_render()¶
- test_buffering_and_render_line_arrow()¶
- test_disable_clears_buffer()¶
- test_immediate_arrow()¶
- test_immediate_circle()¶
- test_immediate_line()¶
- test_render_scope_context_manager()¶
s6.vision.tracking module¶
s6.vision.trajectory module¶
Kinematic trajectory model for 3D tracking.
Maintains a bounded backlog of state samples and provides simple velocity and acceleration estimates, along with constant‑acceleration prediction of the next state. Timestamps are floats (seconds) from a monotonic or epoch clock; choose a consistent source for best results.
Key concepts¶
Frames store a location and the latest velocity estimate.
Velocity is computed from the last two locations using finite difference.
Acceleration is computed from two consecutive velocity estimates whose times are associated with segment midpoints.
Next state prediction uses constant‑acceleration kinematics.
- class s6.vision.trajectory.TrackingFrame(*, location: Vector3D, velocity: Vector3D, timestamp: float)¶
Bases:
BaseModelSingle tracking state sample.
- location¶
3D position in world coordinates.
- velocity¶
3D velocity estimate in world coordinates.
- timestamp¶
Capture time in seconds (monotonic or epoch).
- Type:
float
- timestamp: float¶
- class s6.vision.trajectory.Trajectory(*, frames: List[TrackingFrame] = None, maxlen: int = 20)¶
Bases:
BaseModelFixed‑size trajectory buffer with simple kinematics.
Holds recent
TrackingFramesamples up tomaxlenand provides velocity/acceleration estimates and constant‑acceleration prediction of the next state.- add(location: Vector3D, timestamp: float) None¶
Append an observation and update the velocity estimate.
- Parameters:
location (
s6.schema.primitives.Vector3D) – Observed 3D location in world coordinates.timestamp (float) – Observation time in seconds.
Notes
The latest velocity is computed by finite difference of the last two locations. When the trajectory is empty, velocity is initialised to zero. A minimum
dtof1e-6is enforced to avoid division by zero.
- estimate_acceleration() Vector3D | None¶
Estimate instantaneous acceleration from recent samples.
Uses two consecutive velocity estimates whose times are associated with segment midpoints:
v_{n-1}between(n-2, n-1)andv_nbetween(n-1, n). Acceleration is the finite differencedv/dtbetween the velocity midpoints.- Returns:
Acceleration estimate if at least three samples exist, otherwise
None.- Return type:
Vector3D | None
- estimate_velocity() Vector3D | None¶
Return the latest velocity estimate.
- Returns:
Latest velocity if at least two samples exist, otherwise
None.- Return type:
Vector3D | None
- frames: List[TrackingFrame]¶
- property last_timestamp: float | None¶
Timestamp of the most recent sample, or
Noneif empty.
- maxlen: int¶
- predict_next(dt: float | None = None, timestamp: float | None = None) TrackingFrame | None¶
Predict the next state under constant acceleration.
- Parameters:
dt (float, optional) – Time step in seconds to advance from the last sample. If omitted, the last observed step duration is used.
timestamp (float, optional) – Absolute target timestamp in seconds. When provided,
dtis inferred astimestamp - last_timestamp.
- Returns:
Predicted sample at the next time step, or
Noneif no samples are available.- Return type:
TrackingFrame | None
Notes
Constant‑acceleration kinematics are applied:
x_next = x + v*dt + 0.5*a*dt^2v_next = v + a*dt
A minimum step
dt >= 1e-6is enforced when computing from timestamps. When only a single sample exists, velocity is assumed zero.