onnxruntime
7c3a2522 - [CUDA] update test_flash_attn_cuda.py for Windows (#21006)

Commit
1 year ago
[CUDA] update test_flash_attn_cuda.py for Windows (#21006) Currently test_flash_attn_cuda.py can only run in Linux. It is because it uses triton for rotary reference implementation, and triton python package is not available in Windows. This changes the script to allow the test run in Windows, so that we can test memory efficient attention in Windows. Due to limitation, rotary is excluded in testing on Windows.
Author
Parents
Loading