onnxruntime
11ad2994 - Adds ATen fallback for scaled_dot_product_attention (#21107)

Commit

1 year ago

Adds ATen fallback for scaled_dot_product_attention (#21107) ### Description  Introduces an ATen fallback for `torch.nn.functional.scaled_dot_product_attention`. This operator was introduced in torch 2.0 and, since then, has had many updates including the implementation of memory efficient attention for V100 machines. The current torchscript exporter exports a subgraph for attention which does not provide the same memory savings that PyTorch's memory efficient attention kernel provides. Allowing fallback to PyTorch ATen op for attention helps mitigate memory spike issues for models leveraging memory efficient attention. ### Motivation and Context  Memory issues arose when integrating ONNX Runtime Training with AML Stable Diffusion. --------- Co-authored-by: root <prathikrao@microsoft.com>

References

#21107 - Adds ATen fallback for scaled_dot_product_attention

Author

prathikr

Parents

5b9369e9

onnxruntime 11ad2994 - Adds ATen fallback for scaled_dot_product_attention (#21107)

onnxruntime
11ad2994 - Adds ATen fallback for scaled_dot_product_attention (#21107)