pytorch
7549f901 - `__launch_bounds__` for `torch.mode` with CUDA 11.7 (#79710)

Commit
2 years ago
`__launch_bounds__` for `torch.mode` with CUDA 11.7 (#79710) This is a temporary fix for `TestReductionsCUDA.test_mode_large_cuda` which fails with CUDA 11.7 due to the following: ``` Traceback (most recent call last): File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1805, in wrapper method(*args, **kwargs) File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1805, in wrapper method(*args, **kwargs) File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 390, in instantiated_test raise rte File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 377, in instantiated_test result = test(self, **param_kwargs) File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 943, in only_fn return fn(slf, *args, **kwargs) File "test_reductions.py", line 891, in test_mode_large testset_for_shape((10, 2048), 10) File "test_reductions.py", line 883, in testset_for_shape self._test_mode_intervals(shape, [(i, d - i)], device) File "test_reductions.py", line 870, in _test_mode_intervals values, indices = torch.mode(x, -1, False) RuntimeError: CUDA error: too many resources requested for launch CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ``` cc @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79710 Approved by: https://github.com/malfet
Author
Committer
Parents
Loading