DeepSpeed
adds triton flash attention2 kernel
#4337
Merged

Commits
  • initial commit
    styoun committed 2 years ago
  • temp commit: needs debugging
    styoun committed 2 years ago
  • packed flash attn with mask works
    styoun committed 2 years ago
  • clean-up
    styoun committed 2 years ago
  • add bert/roberta tests to test_inference
    styoun committed 2 years ago
  • is_triton_supported added to Accelerator class
    styoun committed 2 years ago
  • triton supports the flash attention when compute cap > 8.0
    styoun committed 2 years ago
  • formatting
    styoun committed 2 years ago
  • fix comments
    styoun committed 2 years ago
  • cleanup
    styoun committed 2 years ago
  • cleanup flash kernel
    styoun committed 2 years ago
  • Merge branch 'master' into styoun/triton-flash2
    styoun committed 2 years ago
  • fix according to the PR comment
    styoun committed 2 years ago
  • Merge branch 'master' into styoun/triton-flash2
    lekurile committed 2 years ago
Loading