cog/keypoint - Keypoint Training, Preview, And Deployment¶
s6 cog keypoint is the training utility for the COG keypoint model in
sense-core/src/s6/app/cog/keypoint.py.
It can preview augmented samples, run a one-step dry run, train and checkpoint
a GenericKeypointModel, and export ONNX plus TensorRT engines.
The command is driven by an AugmentedKeypointDataset JSON config. If the
--config path does not exist, the command writes the default config template
and exits before doing any preview, training, or export work.
Common Usage¶
# Preview augmented samples batch by batch
s6 cog keypoint --config ./configs/cog/keypoint.json --preview-data
# Generate a default config template if the file does not exist
s6 cog keypoint --config ./configs/cog/new-keypoint.json
# Dry run: build dataset/model, log the graph, run one optimizer step
s6 cog keypoint --config ./configs/cog/keypoint.json --dry-run
# Train for 50 epochs
s6 cog keypoint --config ./configs/cog/keypoint.json --train -e 50 -b 16 -lr 1e-3
# Resume from the latest checkpoint and keep training
s6 cog keypoint --config ./configs/cog/keypoint.json --train --restore latest
# Train, then export the checkpoint from that same run
s6 cog keypoint --config ./configs/cog/keypoint.json --train --deploy
# Export a fixed batch-2 ONNX model and TensorRT engine from the latest checkpoint
s6 cog keypoint --config ./configs/cog/keypoint.json \
--restore latest --deploy --deploy-batch-size 2
# Export without loading a checkpoint first
s6 cog keypoint --config ./configs/cog/keypoint.json \
--deploy --deploy-untrained
Command Flow¶
s6 cog keypoint processes flags in this order:
--preview-data--dry-run--train--deploy
That means a single invocation can preview and then train, or train and then
export. If --train and --deploy are both set without an explicit
--restore, the deploy step uses the checkpoint saved by that training run.
Dataset Config¶
The config must parse as AugmentedKeypointDataset.Config from
src/s6/app/cog/augmented_dataset.py.
The live defaults are:
prefix:modelbase_dir: empty list, or a string/list of dataset rootsdata_mappings.x:["B.image"]data_mappings.y:[["B.tip_point"]]targets.keypoints:truetargets.segmentation:falseaugmentations: the built-in augmentation list fromAugmentedKeypointDataset.Config.default()sampling.enabled:falsenum_segmentation_classes:nullsegmentation_loss_weight:1.0calibration_file:nullstereo_pairing.enabled:falseloss_terms.supervised_keypoint_loss.weight:1.0loss_terms.segmentation_loss.weight:1.0loss_terms.triangulation_reprojection_loss.weight:0.0loss_terms.triangulation_rigidity_loss.weight:0.0
data_mappings.y is nested: each outer entry matches one x image key, and
each inner list declares the ordered point datakeys that should be assembled
into that sample’s N x 2 keypoint tensor. All rows must declare the same
number of point keys. Single-keypoint models still use singleton inner lists.
Training target selection is explicit. Keypoint training requires
targets.keypoints: true. Segmentation training is enabled only when
targets.segmentation: true; that target requires both data_mappings.mask
and num_segmentation_classes, with a class count of at least 2.
num_segmentation_classes is ignored when targets.segmentation is false.
loss_terms controls the scalar multipliers used to build the total training
loss. segmentation_loss_weight is still accepted for older configs and maps
to loss_terms.segmentation_loss.weight when loss_terms is omitted.
Stereo triangulation losses require stereo_pairing.enabled: true,
calibration_file, exactly two LL/LR mappings, and an even --batch_size.
The dataloader reads batch_size / 2 source records and flattens them as
LL0, LR0, LL1, LR1, ..., so even indices are left-camera observations and
odd indices are right-camera observations from the same StructuredDataset line.
triangulation_reprojection_loss triangulates predicted LL/LR keypoint pairs,
projects the reconstructed 3D points back through the same camera geometry used
by PipelineT1, and averages the reprojection consistency loss over valid
pairs. triangulation_rigidity_loss uses the first three reconstructed points
and compares tip->turn and turn->end lengths against 1.5 and 2.0 world
units.
Training Behavior¶
The model is
GenericKeypointModel(in_channels=1, pretrained_backbone=False).The model exports keypoints as
(batch, num_keypoints, 2)and heatmaps as(batch, num_keypoints, H, W), even whennum_keypoints == 1.Input images are converted to grayscale by averaging channels.
Keypoints are trained with
torch.nn.SmoothL1Loss.Segmentation logits, when present, are trained with
torch.nn.CrossEntropyLoss(ignore_index=MISSING_MASK_IGNORE_INDEX).Triangulation losses, when enabled, use the same shared tensor triangulation/projection helpers as the
PipelineT1scalar geometry path.Training uses
StepLRwithgamma=0.33and a step size ofepochs // 5whenepochs >= 5, otherwise1.Checkpoints are written to
checkpoints/<timestamp>/checkpoint_latest.pthand also mirrored tocheckpoints/checkpoint_latest.pth.If
--restore latestor--restore <path>is provided, training resumes from that checkpoint and continues writing into the restored checkpoint directory when possible.
Preview Mode¶
--preview-data loads the dataset through a non-shuffled DataLoader, renders
each batch as a square image grid, and opens an OpenCV window named
dataset preview.
Any key advances to the next batch.
qorEscexits preview mode.
Deployment¶
--deploy first exports ONNX with:
input name:
inputoutput names:
keypoints,heatmaps, and optionalmask_logitsopset_version=17dynamo=False
Export uses a dummy tensor shaped like:
(deploy_batch_size, 1, output_size, output_size)
output_size is resolved from the crop stage in the dataset config. The default
deploy batch size is 1.
When --deploy-path is omitted, the exporter writes under
assets/models/ using this filename pattern:
<prefix>_b<batch>_<timestamp>_<precision>.release.onnx
--precision controls both training autocast and the precision tag in the
auto-generated ONNX filename. If omitted, it defaults to bf16 on CUDA and
fp32 elsewhere.
After ONNX export, deploy runs trtexec to write a TensorRT engine next to the
ONNX file. The engine path is derived by replacing .release.onnx or .onnx
with .trt. When the resolved precision is bf16, the command includes
--bf16.
Key Flags¶
-c, --config- dataset config JSON file, required--preview-data- preview dataset batches--dry-run- run setup plus one training step-t, --train- run training--deploy- export ONNX and then convert it to TensorRT after setup/training--deploy-untrained- allow ONNX export without restoring a checkpoint--restore- checkpoint path orlatest--deploy-path- explicit ONNX output path--deploy-batch-size- fixed export batch size, default1--precision {fp32,bf16}- training precision override--no-tb- disable TensorBoard logging--log-dir- TensorBoard root directory, defaultlogs/cog--verbose-log- enable TensorBoard histogram and image logging-w, --num-workers- DataLoader workers, default4--prefetch-factor- prefetched batches per worker, default4-e, --epochs- training epochs, default100-b, --batch_size- training batch size, default32-lr, --learning_rate- learning rate, default1e-2--log-interval- iteration interval for verbose histogram/image logging, default10
Logging And Outputs¶
Unless --no-tb is passed, TensorBoard logs go under logs/cog/<timestamp>/.
The script logs scalar losses, learning rate, histograms, rendered preview
images, rendered prediction images, and the model graph on the first epoch or
during dry run. Loss scalars include the total train loss, supervised keypoint
loss, optional segmentation loss, and optional triangulation reprojection and
rigidity losses. Histograms and rendered training/prediction image grids are
logged only when --verbose-log is specified.
Preview and training both use the same dataset pipeline from
AugmentedKeypointDataset, so the preview grid, training inputs, and ONNX
export dummy input all reflect the same augmentation config.
PipelineT1 can record derived three-keypoint plus mask training targets under
debug.training_targets.keypoint3_mask when running at normal or higher run
level.
The scope contains valid, plus per-camera LL/LR dictionaries with tip,
turn, end, and mask entries. The example config
configs/cog/data/T1-v3m-3kp-mask.json consumes those debug keys directly, so
the frame schema does not need top-level training-only aliases.