Update sdp guards for performance (#87241)
# Summary
Makes the contiguous check for the nt input more strict/correct as well as makes some performance improvements to the checks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87241
Approved by: https://github.com/cpuhrsch