benchmark
c4510c79 - Add IoU-based accuracy checking for inductor tests segmentation models (#171927)

Commit

19 days ago

Add IoU-based accuracy checking for inductor tests segmentation models (#171927) Summary: # Add IoU-based accuracy checking for segmentation models ### Summary Introduces IoU (Intersection over Union) metric for boolean mask accuracy checking in inductor benchmarks. This provides a more appropriate accuracy comparison for segmentation models like SAM that output boolean masks. Those tests are viable/strict blocking, so there is an interest on maintaining its quality. ### Problem The `sam` model was failing accuracy checks intermittently in CI (`inductor-test / test (inductor_torchbench, *, *, linux.g5.4xlarge.nvidia.gpu)`): ``` sam FAIL: accuracy=fail_accuracy, expected=pass ``` The error logs showed: ``` Accuracy failed: uint8 tensor did not match Accuracy failed for key name masks ``` **Root cause:** Segmentation models output boolean masks that are derived by thresholding floating-point values. Small numerical differences (e.g., 0.4999 vs 0.5001) can cause pixels to flip between `True` and `False`. The existing accuracy check requires exact boolean matching, which is too strict for this use case. ### Solution Instead of suppressing the failures (via `flaky_models` or `non_deterministic`), this PR implements a semantically appropriate comparison method: - **IoU (Intersection over Union)** - A standard metric for comparing segmentation masks - Models can be configured to use IoU ≥ 0.99 threshold for boolean tensor comparison - This catches real accuracy problems while allowing minor pixel-level variations ### Changes 1. **`benchmarks/dynamo/torchbench.yaml`** - Added `tolerance.use_iou_for_bool_masks` config list for models that should use IoU 2. **`benchmarks/dynamo/torchbench.py`** - Added `use_iou_for_bool_accuracy()` method to `TorchBenchmarkRunner` 3. **`benchmarks/dynamo/common.py`** - Added base `use_iou_for_bool_accuracy()` method to `BenchmarkRunner` - Pass new flag to `same()` function 4. **`torch/_dynamo/utils.py`** - Added `use_iou_for_bool` parameter to `same()` function - Implemented IoU comparison logic for boolean tensors: intersection = (ref & res).sum().float() union = (ref | res).sum().float() iou = intersection / union # Pass if IoU >= 0.99 (99% pixel agreement) ### Models enabled for IoU comparison - `sam` - Segment Anything Model - `sam_fast` - Fast variant of SAM - `vision_maskrcnn` - Mask R-CNN (also outputs segmentation masks) ### Why IoU over alternatives? | Approach | Pros | Cons | |----------|------|------| | `flaky_models` | Visible failures, doesn't block CI | Doesn't fix the underlying issue | | `non_deterministic` | Simple config | Silently passes all failures, hides real problems | | **IoU (this PR)** | Semantically correct metric, catches real bugs | Slightly more code | ### Test Plan - Models in `use_iou_for_bool_masks` will use IoU ≥ 0.99 for boolean tensor comparison - Real accuracy problems (IoU < 0.99) will still fail - CI should no longer flake on `sam` model accuracy checks ```python intersection = (ref & res).sum().float() union = (ref | res).sum().float() iou = intersection / union # Pass if IoU >= 0.99 (99% pixel agreement) ``` ### Models enabled for IoU comparison - `sam` - Segment Anything Model - `sam_fast` - Fast variant of SAM - `vision_maskrcnn` - Mask R-CNN (also outputs segmentation masks) ### Why IoU over alternatives? | Approach | Pros | Cons | |----------|------|------| | `flaky_models` | Visible failures, doesn't block CI | Doesn't fix the underlying issue | | `non_deterministic` | Simple config | Silently passes all failures, hides real problems | | **IoU (this PR)** | Semantically correct metric, catches real bugs | Slightly more code | ### Test Plan - Models in `use_iou_for_bool_masks` will use IoU ≥ 0.99 for boolean tensor comparison - Real accuracy problems (IoU < 0.99) will still fail - CI should no longer flake on `sam` model accuracy checks - `sam_fast` can now be verified for accuracy and we can detect regressions X-link: https://github.com/pytorch/pytorch/pull/171927 Approved by: https://github.com/malfet, https://github.com/yangw-dev Reviewed By: atalman Differential Revision: D90364697 fbshipit-source-id: 9b25f18ebbe6079a245af1fc7c3660cb9f171e34

Author

jeanschmidt

Committer

meta-codesync[bot]

Parents

665b72ec

benchmark c4510c79 - Add IoU-based accuracy checking for inductor tests segmentation models (#171927)

benchmark
c4510c79 - Add IoU-based accuracy checking for inductor tests segmentation models (#171927)