DeepSpeed
Optimize Softmax Kernel
#3112
Merged

Commits
  • Simplify kernel
    molly-smith committed 2 years ago
  • Coalesce memory attempt 1. Logits divergence.
    molly-smith committed 2 years ago
  • Logits fix?
    molly-smith committed 2 years ago
  • sync after every global mem access
    molly-smith committed 2 years ago
  • template on iterations. Down to 8.3% cuda time for 8k tokens
    molly-smith committed 2 years ago
  • Up to 64 iterations
    molly-smith committed 2 years ago
  • Add alibi/mask check
    molly-smith committed 2 years ago
  • fp32
    molly-smith committed 2 years ago
  • Revert builder.py
    molly-smith committed 2 years ago
  • naming. precommit
    molly-smith committed 2 years ago
  • Revert "naming. precommit"
    molly-smith committed 2 years ago
  • naming. spacing
    molly-smith committed 2 years ago
  • Spacing. simplify checks
    molly-smith committed 2 years ago
  • remove bsyncs
    molly-smith committed 2 years ago
  • missed bsyncs
    molly-smith committed 2 years ago
  • Merge branch 'master' into mosm/softmax
    molly-smith committed 2 years ago
  • precommit
    molly-smith committed 2 years ago
  • Merge branch 'master' into mosm/softmax
    molly-smith committed 2 years ago
Loading