Potential Coordinate Convention Inconsistency in Extrinsics (World↔Camera)

Hi, I found a possible inconsistency in the coordinate convention for extrinsics used in AnySplat, which might cause confusion, especially when integrating VGGT.

After checking both **VGGT’s** data and model outputs, as well as how the losses are computed, I confirmed that VGGT uses **world-to-camera (w2c) extrinsics** consistently throughout the pipeline.

However, in AnySplat, the situation appears reversed: Input data and the encoder outputs both use **camera-to-world (c2w) extrinsics**. Despite this, one part of the code assumes w2c convention.

Specifically, in the training setup:
`if self.model.encoder.pred_pose:
    self.loss_pose = HuberLoss(alpha=self.train_cfg.pose_loss_alpha, delta=self.train_cfg.pose_loss_delta)
`

This loss calls:
`GT_pose_enc = extri_intri_to_pose_encoding(context_extrinsics, context_intrinsics, image_size_hw)
`

According to the function’s intended design, `extri_intri_to_pose_encoding` expects world-to-camera (w2c) extrinsics. However, context_extrinsics in AnySplat are actually camera-to-world (c2w).

Although this `pose_loss` is currently not actively used in the final training pipeline, this mismatch introduces confusion when interpreting coordinate conventions and potentially misleads users trying to integrate or debug extrinsics-related features.

Thanks for your great work on this project! A clarification of the extrinsics coordinate flow throughout the codebase would be very helpful for users working on pose alignment and dataset integration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential Coordinate Convention Inconsistency in Extrinsics (World↔Camera) #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential Coordinate Convention Inconsistency in Extrinsics (World↔Camera) #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions