vision3d.viz#

Optional 3D visualization utilities.

Requires the viz dependency group:

pip install vision3d[viz]

Functions

`camera_grid`(camera_names[, grid, ...])	Build a 2D camera-panel grid from a dataset's rig metadata.
`fusion_layout`(camera_names[, grid, ...])	Build a fusion-sample layout with a 3D view above a camera grid.
`lidar_view`(*[, entity_prefix, name])	Build a 3D view of the world entity tree.
`log_boxes_3d`(entity, boxes, *[, labels, ...])	Log 3D bounding boxes to Rerun.
`log_cameras`(entity_prefix, images[, ...])	Log all camera images with optional pinhole projection to Rerun.
`log_point_cloud`(entity, points, *[, ...])	Log a point cloud to Rerun.
`log_sample`(inputs[, targets, predictions, ...])	Log a full sample dict to Rerun.

vision3d.viz.camera_grid(camera_names, grid=None, *, entity_prefix='world/cam', overlay_entities=('world/gt/boxes', 'world/pred/boxes'))[source]#

Build a 2D camera-panel grid from a dataset’s rig metadata.

Each cell in grid is an index into camera_names. Entity origins follow log_cameras’ {entity_prefix}_{i} convention so this helper pairs directly with vision3d.viz.log_cameras().

Panels are emitted row-major into a Grid with grid_columns set to the widest row.

Parameters:

camera_names (Sequence[str]) – Per-camera display names indexed by tensor position.
grid (Sequence[Sequence[int]] | None) – Row-major grid of indices into camera_names. None if the dataset hasn’t declared a rig layout. Falls back to a single row in tensor order.
entity_prefix (str) – Prefix for camera entity origins (e.g. "world/cam" -> /world/cam_0, /world/cam_1 …).
overlay_entities (Sequence[str] | None) – Box entities to overlay on every camera panel (e.g. ("world/gt/boxes", "world/pred/boxes")). All overlays are rendered as "majorwireframe" in the projections, since filled boxes would occlude the underlying image. Pass None or an empty sequence to skip the overlay.

Returns:

A Grid containing one Spatial2DView per declared camera.

Raises:

ValueError – If any index is out of range for camera_names.

Return type:

Grid

vision3d.viz.fusion_layout(camera_names, grid=None, *, entity_prefix='world', row_shares=(3, 2), name=None)[source]#

Build a fusion-sample layout with a 3D view above a camera grid.

Composes lidar_view() and camera_grid() under matching entity prefixes that align with vision3d.viz.log_sample()’s defaults (world/cam_* for cameras, world/gt/boxes and world/pred/boxes for the box overlays).

Parameters:

camera_names (Sequence[str]) – Per-camera display names indexed by tensor position.
grid (Sequence[Sequence[int]] | None) – Row-major grid of indices into camera_names. See camera_grid().
entity_prefix (str) – Root entity prefix; the 3D view roots at /{entity_prefix}, cameras at /{entity_prefix}/cam_*, box overlays at /{entity_prefix}/gt/boxes and /{entity_prefix}/pred/boxes.
row_shares (Sequence[int]) – Vertical split ratio between the 3D view and camera grid.
name (str | None) – Optional display name.

Returns:

A Vertical container stacking the 3D view and camera grid.

Return type:

Vertical

vision3d.viz.lidar_view(*, entity_prefix='world', name='3D')[source]#

Build a 3D view of the world entity tree.

The view captures everything under /{entity_prefix}, typically the lidar point cloud, 3D boxes, and any logged camera frustums. Pairs with vision3d.viz.log_point_cloud() and vision3d.viz.log_sample().

Parameters:

entity_prefix (str) – Origin entity path (without leading slash).
name (str) – Display name shown in the view’s title bar.

Returns:

A Spatial3DView rooted at /{entity_prefix}.

Return type:

Spatial3DView

vision3d.viz.log_boxes_3d(entity, boxes, *, labels=None, class_ids=None, label_to_id=None, scores=None, score_threshold=None, fill_mode=None, show_labels=None, log_heading=True)[source]#

Log 3D bounding boxes to Rerun.

Logs boxes as rr.Boxes3D and optionally heading arrows as rr.Arrows3D on a /heading sub-entity. Designed to serve both ground-truth and prediction boxes: route each to its own entity (e.g. "world/gt/boxes" vs "world/pred/boxes") and distinguish them visually with fill_mode while keeping per-class colors.

Parameters:

entity (str) – Rerun entity path (e.g. "world/gt/boxes").
boxes (BoundingBoxes3D) – Bounding boxes in any supported format.
labels (list[str] | None) – Per-box label strings for display. When scores is given, the score is appended to each label.
class_ids (list[int] | None) – Per-box class IDs for coloring via AnnotationContext.
label_to_id (dict[str, int] | None) – Mapping from class name to class ID. When provided, an rr.AnnotationContext is logged statically on the entity so class_ids resolve to consistent colors and display names across frames.
scores (list[float] | Tensor | None) – Per-box confidence scores. When given, each box label shows its score (e.g. "car 0.87").
score_threshold (float | None) – If set, boxes with scores below this value are dropped before logging. Requires scores.
fill_mode (FillMode | Literal['DenseWireframe', 'MajorWireframe', 'Solid', 'TransparentFillMajorWireframe', 'densewireframe', 'majorwireframe', 'solid', 'transparentfillmajorwireframe'] | int | None) – Box fill mode (e.g. "majorwireframe", "densewireframe", "solid").
show_labels (bool | None) – Force per-box labels on (True) or off (False) in the viewer. None leaves Rerun’s default heuristic, which hides labels when there are many boxes.
log_heading (bool) – If True and boxes have rotation, log heading arrows.

Raises:

ValueError – If score_threshold is set without scores, or if scores, labels, or class_ids length does not match the number of boxes.

Return type:

None

vision3d.viz.log_cameras(entity_prefix, images, intrinsics=None, extrinsics=None, *, jpeg_quality=None)[source]#

Log all camera images with optional pinhole projection to Rerun.

Each camera is logged to {entity_prefix}_{i}.

Parameters:

entity_prefix (str) – Rerun entity path prefix (e.g. "world/cam").
images (CameraImages | Tensor) – Camera images [N_cams, C, H, W].
intrinsics (CameraIntrinsics | Tensor | None) – Intrinsic matrices [N_cams, 3, 3].
extrinsics (CameraExtrinsics | Tensor | None) – Extrinsic matrices [N_cams, 4, 4] (lidar-to-camera).
jpeg_quality (int | None) – If set, JPEG-encode each image at this quality (0-100) before logging. None (default) logs uncompressed.

Return type:

None

vision3d.viz.log_point_cloud(entity, points, *, color_by_distance=True)[source]#

Log a point cloud to Rerun.

Parameters:

entity (str) – Rerun entity path (e.g. "world/lidar").
points (PointCloud3D | Tensor) – Point cloud [N, 3+C]. First 3 columns are xyz.
color_by_distance (bool) – Color points by distance from origin.

Return type:

None

vision3d.viz.log_sample(inputs, targets=None, *, predictions=None, entity_prefix='world', label_to_id=None, score_threshold=None, jpeg_quality=None)[source]#

Log a full sample dict to Rerun.

Convenience function that dispatches to type-specific loggers. Ground truth and predictions are logged to separate entities ({entity_prefix}/gt/boxes and {entity_prefix}/pred/boxes) so they can be toggled independently; both keep per-class colors and are distinguished by fill style (ground truth as translucent colored faces, predictions as a wireframe).

Parameters:

inputs (SampleInputs) – SampleInputs with "points", "images", "extrinsics", "intrinsics" keys.
targets (SampleTargets | None) – Optional SampleTargets with "boxes", "labels" keys (ground truth).
predictions (Prediction3D | None) – Optional Prediction3D with "boxes", "scores", "labels" keys. Prediction labels show their score.
entity_prefix (str) – Rerun entity path prefix.
label_to_id (dict[str, int] | None) – Mapping from class name to class ID for consistent coloring. Build this across all frames before logging.
score_threshold (float | None) – If set, predictions below this score are dropped.
jpeg_quality (int | None) – If set, JPEG-encode camera images at this quality (0-100) before logging. See log_cameras().

Return type:

None