SemanticDiff

pytorch
df1e8553 - [fake_impls] fix max_seqlen return values in efficient_attention_forward (#120842)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

205 days ago

[fake_impls] fix max_seqlen return values in efficient_attention_forward (#120842) To match the actual implementation, we should return the max_seqlen_q/k, not M, N, when in the sparse case https://github.com/pytorch/pytorch/blob/7e185277cd6fadada367fa6318beff9e4127570a/aten/src/ATen/native/transformers/cuda/attention.cu#L981-L996 Note that although the .cu file sets max_seqlen_k = 0 in the sparse case, it actually returns max_seqlen_k or N: https://github.com/pytorch/pytorch/blob/7e185277cd6fadada367fa6318beff9e4127570a/aten/src/ATen/native/transformers/cuda/attention.cu#L1224-L1231 Tests - added in the next PR (#102839, which also fixes other parts of the test_fake tests so that we can un-xfail them and actually run the tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120842 Approved by: https://github.com/YuqingJ ghstack dependencies: #120682

Author

davidberard98

davidberard98

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading