[SDPA] Update dispatch checks to catch last_dim_stride != 1. Also update mask padding logic (#106102)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at bb1fc29</samp>
This pull request simplifies and refactors the code for fused scaled dot product attention kernels in `attention.cu` and `sdp_utils.cpp`, and adds new input validation checks and tests. It also modifies the `sdp_params` struct to store optional mask tensors directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106102
Approved by: https://github.com/cpuhrsch