SemanticDiff

pytorch
720f7811 - [CPU] Optimize softmax as flash attention v2 (#118957)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

230 days ago

[CPU] Optimize softmax as flash attention v2 (#118957) ### Descriptions According to flash attention v2, optimize softmax by dividing sum out of the KV inner loop. ### Performance Stable Diffusion V2.1 on GNR | Version | Kernel time (s) | Speedup | |---------|----------------|----------------| | BF16 Before | 28.67 | | BF16 After | 23.55 | 17.86% | | FP32 Before | 54.20 | | FP32 After | 49.47 | 8.73% | Pull Request resolved: https://github.com/pytorch/pytorch/pull/118957 Approved by: https://github.com/jgong5, https://github.com/drisspg

Author

Valentine233

Valentine233

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading