[CUDA] Workaround register spilling issue in mem-efficient SDP kernels on `sm60` (#120445)
We're seeing that a newer version of CUDA introduces register spilling behavior for a few kernels on Pascal---this PR works around them for this specific version.
CC @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120445
Approved by: https://github.com/Skylion007, https://github.com/drisspg