cog/keypoint - Keypoint Training, Preview, And Deployment

s6 cog keypoint is the training utility for the COG keypoint model in sense-core/src/s6/app/cog/keypoint.py. It can preview augmented samples, run a one-step dry run, train and checkpoint a GenericKeypointModel, and export ONNX plus TensorRT engines.

The command is driven by an AugmentedKeypointDataset JSON config. If the --config path does not exist, the command writes the default config template and exits before doing any preview, training, or export work.

Common Usage

# Preview augmented samples batch by batch
s6 cog keypoint --config ./configs/cog/keypoint.json --preview-data

# Generate a default config template if the file does not exist
s6 cog keypoint --config ./configs/cog/new-keypoint.json

# Dry run: build dataset/model, log the graph, run one optimizer step
s6 cog keypoint --config ./configs/cog/keypoint.json --dry-run

# Train for 50 epochs
s6 cog keypoint --config ./configs/cog/keypoint.json --train -e 50 -b 16 -lr 1e-3

# Resume from the latest checkpoint and keep training
s6 cog keypoint --config ./configs/cog/keypoint.json --train --restore latest

# Train, then export the checkpoint from that same run
s6 cog keypoint --config ./configs/cog/keypoint.json --train --deploy

# Export a fixed batch-2 ONNX model and TensorRT engine from the latest checkpoint
s6 cog keypoint --config ./configs/cog/keypoint.json \
  --restore latest --deploy --deploy-batch-size 2

# Export without loading a checkpoint first
s6 cog keypoint --config ./configs/cog/keypoint.json \
  --deploy --deploy-untrained

Command Flow

s6 cog keypoint processes flags in this order:

  1. --preview-data

  2. --dry-run

  3. --train

  4. --deploy

That means a single invocation can preview and then train, or train and then export. If --train and --deploy are both set without an explicit --restore, the deploy step uses the checkpoint saved by that training run.

Dataset Config

The config must parse as AugmentedKeypointDataset.Config from src/s6/app/cog/augmented_dataset.py.

The live defaults are:

  • prefix: model

  • base_dir: empty list, or a string/list of dataset roots

  • data_mappings.x: ["B.image"]

  • data_mappings.y: [["B.tip_point"]]

  • targets.keypoints: true

  • targets.segmentation: false

  • augmentations: the built-in augmentation list from AugmentedKeypointDataset.Config.default()

  • sampling.enabled: false

  • num_segmentation_classes: null

  • segmentation_loss_weight: 1.0

  • calibration_file: null

  • stereo_pairing.enabled: false

  • loss_terms.supervised_keypoint_loss.weight: 1.0

  • loss_terms.segmentation_loss.weight: 1.0

  • loss_terms.triangulation_reprojection_loss.weight: 0.0

  • loss_terms.triangulation_rigidity_loss.weight: 0.0

data_mappings.y is nested: each outer entry matches one x image key, and each inner list declares the ordered point datakeys that should be assembled into that sample’s N x 2 keypoint tensor. All rows must declare the same number of point keys. Single-keypoint models still use singleton inner lists.

Training target selection is explicit. Keypoint training requires targets.keypoints: true. Segmentation training is enabled only when targets.segmentation: true; that target requires both data_mappings.mask and num_segmentation_classes, with a class count of at least 2. num_segmentation_classes is ignored when targets.segmentation is false.

loss_terms controls the scalar multipliers used to build the total training loss. segmentation_loss_weight is still accepted for older configs and maps to loss_terms.segmentation_loss.weight when loss_terms is omitted.

Stereo triangulation losses require stereo_pairing.enabled: true, calibration_file, exactly two LL/LR mappings, and an even --batch_size. The dataloader reads batch_size / 2 source records and flattens them as LL0, LR0, LL1, LR1, ..., so even indices are left-camera observations and odd indices are right-camera observations from the same StructuredDataset line. triangulation_reprojection_loss triangulates predicted LL/LR keypoint pairs, projects the reconstructed 3D points back through the same camera geometry used by PipelineT1, and averages the reprojection consistency loss over valid pairs. triangulation_rigidity_loss uses the first three reconstructed points and compares tip->turn and turn->end lengths against 1.5 and 2.0 world units.

Training Behavior

  • The model is GenericKeypointModel(in_channels=1, pretrained_backbone=False).

  • The model exports keypoints as (batch, num_keypoints, 2) and heatmaps as (batch, num_keypoints, H, W), even when num_keypoints == 1.

  • Input images are converted to grayscale by averaging channels.

  • Keypoints are trained with torch.nn.SmoothL1Loss.

  • Segmentation logits, when present, are trained with torch.nn.CrossEntropyLoss(ignore_index=MISSING_MASK_IGNORE_INDEX).

  • Triangulation losses, when enabled, use the same shared tensor triangulation/projection helpers as the PipelineT1 scalar geometry path.

  • Training uses StepLR with gamma=0.33 and a step size of epochs // 5 when epochs >= 5, otherwise 1.

  • Checkpoints are written to checkpoints/<timestamp>/checkpoint_latest.pth and also mirrored to checkpoints/checkpoint_latest.pth.

  • If --restore latest or --restore <path> is provided, training resumes from that checkpoint and continues writing into the restored checkpoint directory when possible.

Preview Mode

--preview-data loads the dataset through a non-shuffled DataLoader, renders each batch as a square image grid, and opens an OpenCV window named dataset preview.

  • Any key advances to the next batch.

  • q or Esc exits preview mode.

Deployment

--deploy first exports ONNX with:

  • input name: input

  • output names: keypoints, heatmaps, and optional mask_logits

  • opset_version=17

  • dynamo=False

Export uses a dummy tensor shaped like:

(deploy_batch_size, 1, output_size, output_size)

output_size is resolved from the crop stage in the dataset config. The default deploy batch size is 1.

When --deploy-path is omitted, the exporter writes under assets/models/ using this filename pattern:

<prefix>_b<batch>_<timestamp>_<precision>.release.onnx

--precision controls both training autocast and the precision tag in the auto-generated ONNX filename. If omitted, it defaults to bf16 on CUDA and fp32 elsewhere.

After ONNX export, deploy runs trtexec to write a TensorRT engine next to the ONNX file. The engine path is derived by replacing .release.onnx or .onnx with .trt. When the resolved precision is bf16, the command includes --bf16.

Key Flags

  • -c, --config - dataset config JSON file, required

  • --preview-data - preview dataset batches

  • --dry-run - run setup plus one training step

  • -t, --train - run training

  • --deploy - export ONNX and then convert it to TensorRT after setup/training

  • --deploy-untrained - allow ONNX export without restoring a checkpoint

  • --restore - checkpoint path or latest

  • --deploy-path - explicit ONNX output path

  • --deploy-batch-size - fixed export batch size, default 1

  • --precision {fp32,bf16} - training precision override

  • --no-tb - disable TensorBoard logging

  • --log-dir - TensorBoard root directory, default logs/cog

  • --verbose-log - enable TensorBoard histogram and image logging

  • -w, --num-workers - DataLoader workers, default 4

  • --prefetch-factor - prefetched batches per worker, default 4

  • -e, --epochs - training epochs, default 100

  • -b, --batch_size - training batch size, default 32

  • -lr, --learning_rate - learning rate, default 1e-2

  • --log-interval - iteration interval for verbose histogram/image logging, default 10

Logging And Outputs

Unless --no-tb is passed, TensorBoard logs go under logs/cog/<timestamp>/. The script logs scalar losses, learning rate, histograms, rendered preview images, rendered prediction images, and the model graph on the first epoch or during dry run. Loss scalars include the total train loss, supervised keypoint loss, optional segmentation loss, and optional triangulation reprojection and rigidity losses. Histograms and rendered training/prediction image grids are logged only when --verbose-log is specified.

Preview and training both use the same dataset pipeline from AugmentedKeypointDataset, so the preview grid, training inputs, and ONNX export dummy input all reflect the same augmentation config.

PipelineT1 can record derived three-keypoint plus mask training targets under debug.training_targets.keypoint3_mask when running at normal or higher run level. The scope contains valid, plus per-camera LL/LR dictionaries with tip, turn, end, and mask entries. The example config configs/cog/data/T1-v3m-3kp-mask.json consumes those debug keys directly, so the frame schema does not need top-level training-only aliases.