[RelEng] Define `BUILD_BUNDLE_PTXAS` (#119750)
That would bundle PTXAS into a `bin` folder
When compiling for Triton, define `TRITION_PTXAS_PATH` if `ptxas` is bundled with PyTorch Needed to make PyTorch compiled against CUDA-11.8 usable with 11.8 driver, as Triton is bundled with latest (CUDA-12.3 at time of PyTorch-2.2 release) ptxas
Needs https://github.com/pytorch/builder/commit/5c814e2527b3f5797488bf57d9d5425e63dcc1ac to produce valid binary builds
Test plan:
- Create dummy ptxas in `torch/bin` folder and observe `torch.compile` fail with backtrace in Triton module.
- Run following script (to be added to binary tests ) against CUDA-11.8 wheel:
```python
import torch
import triton
@torch.compile
def foo(x: torch.Tensor) -> torch.Tensor:
return torch.sin(x) + torch.cos(x)
x=torch.rand(3, 3, device="cuda")
print(foo(x))
# And check that CUDA versions match
cuda_version = torch.version.cuda
ptxas_version = triton.backends.nvidia.compiler.get_ptxas_version().decode("ascii")
assert cuda_version in ptxas_version, f"CUDA version mismatch: torch build with {cuda_version}, but Triton uses ptxs {ptxas_version}"
```
Fixes https://github.com/pytorch/pytorch/issues/119054
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119750
Approved by: https://github.com/jansel, https://github.com/atalman