System Architecture Overview

This page summarizes the current Sense Core runtime architecture as implemented in src/s6/app/track.py, src/s6/app/_contextgenerators.py, and src/s6/app/pipeline/.

Overview

The runtime is organized into four layers:

  1. Entry layer

    • s6.app.main is the public CLI loader.

    • s6 track dispatches into s6.app.track, which owns argument parsing, config resolution, and mode selection.

  2. Context layer

    • ContextGenerator implementations acquire frames from live GStreamer sources or replay them from a StructuredDataset.

  3. Pipeline layer

    • PipelineLoader validates config, resolves pipeline_name, and instantiates a concrete BasePipeline subclass.

    • Pipeline implementations treat input context keys as read-only and publish only context["export"] plus context["debug"].

  4. Interface layer

    • Headless mode runs in the foreground process.

    • UI mode uses s6.app._gui.MainWindow plus a spawned TrackRuntime worker process.

    • Uplink mode forwards {"export": ...} snapshots to a WebSocket.

Execution Flow

  1. s6.app.main discovers the track command and dispatches to s6.app.track.main().

  2. track parses CLI flags such as --input, --config, --ui, --record-only, --uplink, --repeat, --realtime-playback, and --run-level.

  3. PipelineLoader.load() reads the caller-supplied config path or config object, validates shared fields with PipelineConfigBase, resolves pipeline_name, and parses the pipeline-specific model. s6 track supplies configs/pipeline.config.yaml by default when --config is omitted. The loader can also override run_level from the CLI.

  4. track builds one of these context generators:

    • RemoteGSTContextGenerator for --input gst

    • LocalGSTContextGenerator for --input gst-local

    • LocalGSTContextGeneratorV2 for --input gst-local-v2

    • DatasetContextGenerator for replaying a dataset directory

    • db: inputs are rejected; the old database-backed streamer path is retired

  5. Each generator yields a rolling contexts list where contexts[0] is the newest frame and later entries are prior history.

  6. Unless --record-only is set, the selected pipeline runs on that rolling context list.

  7. Depending on flags, the newest context is displayed in the Qt/VisPy UI, written to an output dataset, forwarded to a telemetry uplink, or processed headlessly.

Context Generators

ContextGenerator

Base class that:

  • yields a rolling history up to MAX_HISTORY_LENGTH;

  • normalizes timestamps and per-frame metadata such as frame_serial, frame_length, and flags;

  • optionally records contexts to StructuredDataset;

  • drains command-queue messages for replay controls and recording toggles;

  • exports profiler output on shutdown.

RemoteGSTContextGenerator

  • Uses platform.gstreamer.client.

  • Builds remote pygst.client.Client sources.

  • Reorders frames through platform.order_frames(...) on the selected platform implementation.

LocalGSTContextGenerator

  • Uses platform.gstreamer.local.

  • Builds local pygst.client.Pipeline capture sources.

  • Uses MultiCameraCapture and records the synchronized release timestamp as context["timestamp"].

LocalGSTContextGeneratorV2

  • Uses platform.gstreamer.local.

  • Builds one pygst.client.CombinedPipeline source that horizontally combines the configured local cameras.

  • Reads that combined stream through one direct Gst appsink capture, splits the frame back into per-camera images, and timestamps each release with host time.

  • Does not use MultiCameraCapture; sync-only local capture settings are largely irrelevant in this mode because there is only one capture stream.

DatasetContextGenerator

  • Replays a StructuredDataset directory from disk.

  • Supports --repeat for looping.

  • Supports --realtime-playback to pace replay using stored timestamp deltas.

  • Provides random-access frame reads to TrackRuntime, which owns interactive play, pause, stop, forward/backward, seek, and dataset switching semantics.

Pipeline Architecture

Pipelines are stateful Python classes, not declarative DAG definitions.

  • All concrete pipelines inherit BasePipeline.

  • Heavy initialization is deferred until first use via _ensure_initialized().

  • load_calibrations() populates self.cameras.

  • load_models() loads the runtime detector models.

  • _process_frame(contexts) performs one frame of work and returns PipelineFrameOutput.

  • _setup_views() and viewport() provide UI bindings when a pipeline has a visual layout.

  • run_level comes from config by default and can be overridden from s6 track with --run-level.

  • Lower run levels reduce preview and overlay work, while debug enables the heaviest diagnostics.

  • Shared geometry helpers live in s6.vision.solver, including calibrated triangulation, tip solving, and rigid model-to-observed pose recovery from matched 3D correspondences.

Concrete Pipeline

PipelineT1

  • Camera roles: LL, LR

  • Calibration frame: LL is the world-reference camera (identity extrinsic), and LR extrinsics are expressed relative to that shared LL frame.

  • Per-frame sequence:

    • prepare the current input frame

    • build typed T1CameraFrame, T1RoiInput, and T1RoiDetection stage data

    • run one LL/LR ROI detector batch

    • either use the triplet fast path for detector models with at least three keypoints or fit typed T1LineFit support-line results

    • solve typed T1SolveResult geometry and pose state

    • publish optional derived three-keypoint plus mask training targets under context["debug"]["training_targets"]

    • render overlays into the typed camera buffers while stages publish debug values into the frame-scoped self.debug_context accumulator

    • build export previews

  • The original input keys, including camera images and replay metadata, are preserved for dataset replay. Prepared image/bgr_image buffers remain under each camera key, while ROI data, diagnostic masks, and solver state are published under context["debug"].

  • T1 stage data contracts live in src/s6/app/pipeline/t1_contracts.py. T1 stage helpers should use typed inputs, return typed results, and draw directly into typed camera buffers instead of rebuilding context-shaped dictionaries.

  • The typed ROI-preparation stage optionally predicts each LL/LR ROI by projecting the pipeline’s persistent world-space TrajectoryV2 spherical frame into the current camera when tracking.enable_prediction is enabled. When tracking.roi_prediction_use_extrapolation is false, it instead projects the last accepted TrajectoryV2.last_frame sphere. For T1, each accepted tracker frame encloses the full observed tip_point/turn_point/end_point triplet only; frames missing any one of those points are treated as invalid tracker updates instead of refreshing the sphere. The accepted triplet gets tracking.tracking_volume_radius as extra world-space padding, and T1 projects that sphere so its projected major axis becomes the square ROI size. ROI following is driven only by that persistent sphere state, not by output-pose gate status. T1 tries the requested sphere frame first, then falls back through older sphere-derived frames, and resets both ROIs to the default LL/LR search windows only after the configured tracking.roi_prediction_recenter_timeout_sec interval without a valid projectable sphere, or after tracking.roi_prediction_recenter_invalid_frames consecutive invalid tracker updates when that optional limit is set. During tolerated invalid updates, TrajectoryV2 holds the last predicted sphere steady instead of extrapolating it farther. The ROI tracking box turns green only when extrapolation is enabled and the EMA-smoothed prediction path in TrajectoryV2 is active.

  • The typed solve stage has two detector-output paths. If the loaded detector exposes at least three keypoints, T1 requests no mask output, triangulates the predicted tip/turn/end triplet directly, and skips tip erasure, mask-line fitting, and support-line triangulation. tip.triplet_tip_source selects the model tip by default or the existing intensity-refined tip when set to refined; invalid triplet frames stay on the fast path and rely on the trajectory/output gate instead of falling back to mask solving. One-point detector models keep the legacy path: triangulate the T1 tip first, then optionally use that 3D tip to erase a projected sphere from each LL/LR ROI mask via solver.tip_mask_erase_radius, then fit the resulting image lines as LL/LR diagnostic overlays before support-line triangulation. At run_level=normal, it also draws projected 2D connectors for tip_point -> turn_point and turn_point -> end_point, plus a small projected pose-axis indicator recovered by aligning the fixed instrument model points P1/P2/P3 to the solved tip_point/turn_point/end_point world points, and projects the transformed model points back into both camera views. At run_level=dev, it also adds a compact motion/stats box and projected tip velocity arrows. At run_level=debug, it projects the transformed instrument.obj vertices onto the LL frame in the solved LL/world pose. When a solve yields the full observed tip_point/turn_point/end_point triplet, the typed solve stage updates the pipeline-owned persistent world-space trajectory from that triplet; otherwise the frame is treated as an invalid tracker update while the instrument-tip debug payload keeps the frame-local solved geometry and validity fields. If a newly solved tracked-point set would cause an acceleration spike beyond tracking.acceleration_rejection_threshold, TrajectoryV2 rejects that sample, context["debug"]["targets"]["instrument_tip"]["tip_point_raw"] preserves the raw triangulation, context["debug"]["targets"]["instrument_tip"]["tip_tracking_filtered"] is set, and downstream solving/export reuse the predicted fallback tracked points instead. Low-velocity EMA smoothing still applies only to the persistent trajectory’s prediction path; the accepted motion fields remain raw there. The persistent TrajectoryV2 sphere now owns ROI follow/recenter policy, while OutputPoseGate remains export-only and TrajectoryV2 remains the 3D tracking-volume prediction and rejection helper. When a stereo line_world is available, the typed solve stage also stores an extended display LineSegment3D in context["debug"]["targets"]["instrument_tip"]["line_world"], plus the triangulated tip as tip_point, the closest-point turn_point, a 3 mm along-line end_point marker selected by minimizing the summed LL/LR projection angle against the per-view tip_point -> mask_line_segment_global.centroid direction, the recovered rigid model pose as instrument_pose when all three observed points are available, pose_solve_valid only when that full triplet yields a recovered pose, keeps its persistent output-pose tracker in the native virtual camera-B basis, then exports midpoint_3d and instrument_pose_quaternion in the visualizer world basis derived from the visualizer helper’s known B camera placement (midpoint_3d in meters and already in scene/world coordinates, quaternion order [x, y, z, w] and ready to apply directly to the original Three.js instrument asset), then emits base64-encoded quarter-resolution LL and LR previews as context["export"]["bgr_image_ll_base64"] and context["export"]["bgr_image_lr_base64"] after the frame’s buffered marker rendering is flushed whenever preview generation is enabled, converts the fixed instrument model points from their Three.js/model-local basis into the native Sense/OpenCV basis before pose recovery, and gates both translation and rotation through a persistent output-pose tracker that enters after 3 stable frames, holds the last stable pose across up to 2 dropped frames, and leaves tracking after the configured unstable-frame budget.

Tracking state for PipelineT1 lives on the pipeline instance itself through its persistent TrajectoryV2 and output-pose gate; frame-local solver output is published under context["debug"]["targets"]["instrument_tip"].

Runtime Modes

Headless mode

  • Default mode when --ui is not set.

  • Runs capture and optional inference in the foreground process.

UI mode

  • Enabled with s6 track --ui.

  • Starts a spawned worker process for capture and inference because CUDA and TensorRT initialization are not reliable in a forked child.

  • The main process hosts the Qt/VisPy UI and consumes processed contexts over a multiprocessing queue.

Retired Components

The old streamer-dependent runtime is retired:

  • s6 stream

  • s6 id

  • s6 data collect

  • s6 track -i network

See retired_streamer.md for migration context.