pytorch
6ed26392 - Update xfails for scaled_dot_product_attention (#120928)

Commit View On GitHub

Commit

204 days ago

Update xfails for scaled_dot_product_attention (#120928) Update xfails for test_dispatch_meta_outplace and test_dispatch_symbolic_meta_outplace. These tests are sometimes expected to fail, because we moved the registrations from meta_registrations.py to fake_impls.py. AFAIK, this is okay because fake tensors will still work because we have special handling in fake_impls.py. The purpose of this PR is to update the xfails so they are correctly xfailing the failing tests. Previously, I set these to xfail only for bfloat16, float16, and float32, but not float64; but this isn't really correct. Explanation below: Scaled dot product attention (SDPA) has multiple implementations, including efficient_attention, flash_attention, and unfused attention. flash_attention supports fp16, bf16. efficient_attention supports fp16, bf16, fp32. unfused attention supports all dtypes. efficient_attention and flash_attention implementations will fail the meta tests, but the unfused attention will not. Certain platforms may support none, both, or one of efficient_attention and flash_attention. Unfused attention will pass because it falls back to constituent ops which have registered meta kernels. So: on CUDA, we have all 3 available: in bf16, fp16, fp32, we'll select one of the fused implementations (where this test will fail). On ROCM, we don't have efficient_attention: so fp32 will use the unfused implementation, where the test will pass. Fix in this PR: * If any fused impl is available, then xfail float16 & bfloat16 * If efficient_attention is available, then also xfail float32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120928 Approved by: https://github.com/drisspg

Author

davidberard98

Committer

pytorchmergebot

Parents

2a08a517

pytorch 6ed26392 - Update xfails for scaled_dot_product_attention (#120928)

Commit

pytorch
6ed26392 - Update xfails for scaled_dot_product_attention (#120928)