[fake_impls] fix max_seqlen return values in efficient_attention_forward (#120842)
To match the actual implementation, we should return the max_seqlen_q/k, not M, N, when in the sparse case
https://github.com/pytorch/pytorch/blob/7e185277cd6fadada367fa6318beff9e4127570a/aten/src/ATen/native/transformers/cuda/attention.cu#L981-L996
Note that although the .cu file sets max_seqlen_k = 0 in the sparse case, it actually returns max_seqlen_k or N:
https://github.com/pytorch/pytorch/blob/7e185277cd6fadada367fa6318beff9e4127570a/aten/src/ATen/native/transformers/cuda/attention.cu#L1224-L1231
Tests - added in the next PR (#102839, which also fixes other parts of the test_fake tests so that we can un-xfail them and actually run the tests)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120842
Approved by: https://github.com/YuqingJ
ghstack dependencies: #120682