add --gpu-max-threads-per-block=256 to hipMAGMA build (#54161)
Summary:
As of ROCm version 4.0.1, the HIP compiler default for max threads per block is 256 but is subject to change in future releases. To protect against changes, hipMAGMA should be built with the previously-assumed default. This change is necessary here in PyTorch until upstream magma project utilizes `__launch_bounds__` or some other means of controlling launch bounds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54161
Reviewed By: zou3519
Differential Revision: D27194829
Pulled By: malfet
fbshipit-source-id: 8be2cff3b38786526954b627ff6ab02b510040a1