pytorch
fe181446 - Generalize HIP-specific launch bounds to apply to CUDA as well (#56143)

Commit

3 years ago

Generalize HIP-specific launch bounds to apply to CUDA as well (#56143) Summary: Launch bounds for HIP were added along the way, but the smaller CUDA devices (like Jetson) also benefit from them. So here I go over the HIP-specific launch bounds and try to generalize them to cover CUDA, too. The long term goal is to eventually not need to resort to somewhat ad-hoc adaptations like the reduction of block size discussed in https://github.com/pytorch/pytorch/issues/8103, but have good coverage of our kernels with launch bound annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56143 Reviewed By: agolynski Differential Revision: D27804640 Pulled By: ngimel fbshipit-source-id: d4c345f9f7503e050a46361bfe2625865d0a42ba

Author

t-vi

Committer

facebook-github-bot

Parents

48c6f0c2

pytorch fe181446 - Generalize HIP-specific launch bounds to apply to CUDA as well (#56143)

pytorch
fe181446 - Generalize HIP-specific launch bounds to apply to CUDA as well (#56143)