adds triton flash attention2 kernel #4337
initial commit
4318d771
temp commit: needs debugging
95456d0e
packed flash attn with mask works
2466fd9d
clean-up
7e2e858e
add bert/roberta tests to test_inference
b0d42400
is_triton_supported added to Accelerator class
b6c47f6b
stephen-youn
marked this pull request as ready for review 2 years ago
triton supports the flash attention when compute cap > 8.0
1ec42552
formatting
26853e9d
fix comments
f30f64ca
cleanup
401b3d5a
lekurile
requested changes
on 2023-09-14
cleanup flash kernel
fae2ab9a
Merge branch 'master' into styoun/triton-flash2
4bae6079
fix according to the PR comment
4e6c1646
lekurile
approved these changes
on 2023-09-20
Merge branch 'master' into styoun/triton-flash2
544f1ddc
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub