Skip to content

Adjust clamping for rotated bboxes #9112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AntoineSimoulin
Copy link
Member

@AntoineSimoulin AntoineSimoulin commented Jun 20, 2025

Adjust clamping for Rotated Boxes

This PR is a follow-up to #9104, aiming to address inconsistencies in the clamping function and improve its intuitiveness. The initial approach for clamping rotated bounding boxes focused on finding the largest angle-preserving box enclosed within the original box and the image canvas. However, as illustrated in Figure 2, this method can lead to non-intuitive results where the box does not fully enclose the underlying object. To address this issue, this PR proposes an adjustment to the clamping function. Instead of seeking the largest angle-preserving box, we now aim to find the smallest angle-preserving box that encloses the intersection of the original box and the image canvas. This change ensures that the resulting box is more intuitive.

These adjustments have some key implications. With this new approach, clamped rotated boxes may have vertices outside the canvas. However, the center of the bounding box is guaranteed to remain within the canvas. This PR addresses #8254 by ensuring that rotated bounding boxes SHOULD be clamped (consistent with un-rotated boxes). Crucially, as illustrated in Figure 1, the clamping operation preserves the original box's pixel assignments within the image canvas, ensuring that no information is lost during the process.

Details of the adjustments

This PR implements in particular the following modifications:

  • Modify the conditions from the clamping function to ensure the resulting box completely encapsulate the input box. The output from the clamping operation is the smallest angle-preserving box that encloses the intersection of the original box and the image canvas.
  • Modify the elastic_bounding_boxes for rotated boxes so that we use the "CXCYWHR" format instead of "XYXYXYXY". The elastic transform needs the transformed points to be within the canvas size. This is the case for the center or rotated boxes but not necessarily for all vertices.
  • Fix the _order_bounding_boxes_points in the case of largest negative values along the y-axis.

Illustration of the adjustements

We illustrate the adjustments on the clamping function using this image example. The clamping should be more intuitive and should prevent from loosing information.

image

Figure 1: Illustration of the clamping adjustments (original box in grey and corresponding clamped box in blue).

image

Figure 2: Illustration of the clamping BEFORE this PR.

image

Figure 3: Illustration of the clamping AFTER this PR.

Test plan

Please run the following tests:

pytest test/test_transforms_v2.py -k box -v
...
2372 passed, 1432 skipped, 5025 deselected in 46.08s

Test Plan:
```bash
pytest test/test_transforms_v2.py -k box -v
```
Copy link

pytorch-bot bot commented Jun 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9112

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit c6b365b with merge base 6bbe010 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@NicolasHug
Copy link
Member

Thanks for the PR @AntoineSimoulin, and for the detailed pictures!

It's clear from Figure 2 that our current clamping strategy leads to sub-optimal boxes. Out of curiosity, could you share the transformations that were used in each result? I suspect that the more transforms are used, the more clamping happens, and thus more information is lost.

The clamping strategy proposed in this PR allows for some corners of the box to be outside of the image canvas. That makes me wonder: what do we actually want from a clamping operation? Do we want the corners to be within the canvas, or do we only need the center of the box to be within the canvas?

My current understanding is that there is a spectrum of clamping strategies:

  • no clamp at all. This is what retains the most information.
  • a strict clamping, where we force all of the box points to be in the canvas, as implemented in main. Potentially, a lot of information is lost.
  • a more lenient clamping as in this PR, which seems to be an intermediate strategy between the 2 strategies above: we lose less information than with strict clamping, but we may still have points outside of the canvas.

I do agree that the clamping in this PR leads to less surprising results than the strict clamping we have in main. Maybe we could expose it as one of multiple clamping strategies. However, since it still results in points outside the canvas and some information loss, I wonder if users wouldn't prefer the no-clamping strategy in general?

@AntoineSimoulin
Copy link
Member Author

Out of curiosity, could you share the transformations that were used in each result?

Figure 2 and 3 are obtained by applying a CenterCrop transformation for size in 300, 500, 1000, and original image size.

My current understanding is that there is a spectrum of clamping strategies

Yeah I do agree with the proposed breakdown.

Maybe we could expose it as one of multiple clamping strategies. However, since it still results in points outside the canvas and some information loss, I wonder if users wouldn't prefer the no-clamping strategy in general?

As illustrated in Figure 1, I feel the strategy proposed in this PR offers the best trade-off. For instance, in the case of object detection, it would be very difficult for a model to predict a vertex very far from the canvas boundaries. Also this transformation ensures that the center of the box is within the image canvas and therefore we should be able to apply any transformation without error. Finally, Contrary to stricter clamping we do not loose information as all pixels within the canvas assigned to the object are still within the bounding box.

I would prefer to opt-in by default for this strategy and do not keep implementation for other option for now to keep simplicity of the codebase. Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants