PR #4337 adds triton flash attention2 kernel

initial commit

styoun committed 2 years ago

temp commit: needs debugging

styoun committed 2 years ago

packed flash attn with mask works

styoun committed 2 years ago

clean-up

styoun committed 2 years ago

add bert/roberta tests to test_inference

styoun committed 2 years ago

is_triton_supported added to Accelerator class

styoun committed 2 years ago

triton supports the flash attention when compute cap > 8.0

styoun committed 2 years ago

formatting

styoun committed 2 years ago

fix comments

styoun committed 2 years ago

cleanup

styoun committed 2 years ago

cleanup flash kernel

styoun committed 2 years ago

Merge branch 'master' into styoun/triton-flash2

styoun committed 2 years ago

fix according to the PR comment

styoun committed 2 years ago

Merge branch 'master' into styoun/triton-flash2

lekurile committed 2 years ago

DeepSpeed adds triton flash attention2 kernel #4337 Merged