data/filter β€” Dataset browser and filterΒΆ

Browse one or more StructuredDataset directories, inspect the selected image/keypoint pairs, and delete entries interactively or by query.

UsageΒΆ

s6 data filter ./path/to/dataset

By default the command looks for B/image and B/tip_point. Override them by repeating the datakey flags once per source:

s6 data filter ./path/to/dataset \
  --image-key LL/image --point-key LL/tip_point \
  --image-key LR/image --point-key LR/tip_point \
  --mask-key LL/mask --mask-key LR/mask

You can also point it at an AugmentedKeypointDataset config JSON and let it pull data_mappings and base_dir from the training config:

s6 data filter --config ./cfg/ds_keypoint.json

Passing a .json path positionally behaves the same as --config.

In config mode the preview is built from raw data_mappings samples before augmentation. For example, the T1 three-keypoint mask config previews each mapped LL/LR training source with its ordered keypoints and mask:

s6 data filter configs/cog/data/T1-v3m-3kp-mask.json

Mapped previews with missing required image/keypoint fields are skipped. This is expected for inference-produced datasets where only successful detections publish training targets.

To delete source records that cannot satisfy every configured mapping, scan the config and confirm deletion:

s6 data filter configs/cog/data/T1-v3m-3kp-mask.json --delete-incompatible

Use --yes to skip the cleanup confirmation prompt in scripts.

Query ModeΒΆ

Use --query to count entries matching a raw dataset expression without opening the interactive viewer:

s6 data filter ./path/to/dataset \
  --query 'debug.training_targets.keypoint3_mask.valid==true'

Add --delete to delete matched entries after confirmation:

s6 data filter ./path/to/dataset \
  --query 'debug.training_targets.keypoint3_mask.valid==false' \
  --delete

Use --yes to skip the confirmation prompt in scripts:

s6 data filter ./path/to/dataset \
  --query 'debug.training_targets.keypoint3_mask.valid==false' \
  --delete --yes

Query expressions support dotted or slash datakeys, JSON-style literals true/false/null, comparisons, and/or/not, and in/not in. Missing fields evaluate as non-matches.

ControlsΒΆ

  • a and d move to the previous or next entry in the current dataset.

  • w and s switch to the previous or next non-empty dataset when multiple dataset roots are configured.

  • x deletes the current entry with StructuredDataset.delete_one().

  • q quits.

BehaviorΒΆ

  • Manual mode validates the dataset directory, loads it with structured_dataset.StructuredDataset, and uses the CLI datakeys directly.

  • Config mode reads AugmentedKeypointDataset.Config, accepts base_dir as a string or list, and resolves relative paths against the current working directory. It forces raw, unshuffled, unbalanced preview order so deletion maps predictably back to source records. Incomplete mapped samples are skipped during preview indexing, and --delete-incompatible removes their source records after confirmation.

  • --image-key, --point-key, and --mask-key must all repeat the same number of times when provided.

  • Datakeys may use either slash or dot separators.

  • Each selected source is rendered as its own panel with a keypoint cross and a zoomed inset around the current point.

  • In config mode, nested data_mappings.y groups render multiple labeled point crosses on the same panel. Deleting a mapped config preview deletes the underlying source record, so all mapped previews from that record are removed together.

  • Mask overlays are optional; if a configured mask field is missing in a sample, the image still renders without the overlay.