patching 11.1 ptxas issue
Fixes #75708
`--ptxas-options` only passes its immediate argument to ptxas. So we should have put that in front of every ptxas argument.
It's actually strange how this worked in CUDA TK 11.6. I'm following up with nvrtc team on this internally, meanwhile we should merge this PR to avoid register failures in generated kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76226
Approved by: https://github.com/davidberard98