SemanticDiff

pytorch
fdae9363 - [meta registration] efficient_attention_forward fix for NT inputs (#120594)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

207 days ago

[meta registration] efficient_attention_forward fix for NT inputs (#120594) When cu_seqlens_q is provided, we should use the user-specified max_seqlen_q instead of inferring it as query.size(1): https://github.com/pytorch/pytorch/blob/1c7b0e7cd16cc1af4c72b1a863d63fd731e015a8/aten/src/ATen/native/transformers/cuda/attention.cu#L989 This wasn't caught because the value is taken as ceil(max_seqlen / 32) * 32; in the opinfos, and the opinfo inputs were small enough that this value was 32 in either case. Differential Revision: [D54179733](https://our.internmc.facebook.com/intern/diff/D54179733) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120594 Approved by: https://github.com/drisspg

Author

davidberard98

davidberard98

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading