DeepSpeed
Optimize Softmax Kernel
#3112
Merged

Optimize Softmax Kernel #3112

molly-smith merged 18 commits into master from mosm/softmax
molly-smith
molly-smith Simplify kernel
473b7640
molly-smith Coalesce memory attempt 1. Logits divergence.
5a49cecd
molly-smith Logits fix?
0e949b9d
molly-smith sync after every global mem access
b51551c8
molly-smith template on iterations. Down to 8.3% cuda time for 8k tokens
0260509e
molly-smith Up to 64 iterations
fd803088
molly-smith Add alibi/mask check
2d7d1749
molly-smith fp32
b7ad2a2d
molly-smith Revert builder.py
bc450d48
molly-smith naming. precommit
150eb7d9
molly-smith Revert "naming. precommit"
de998223
molly-smith naming. spacing
77233e0f
molly-smith Spacing. simplify checks
9b2697ea
molly-smith remove bsyncs
386b14aa
molly-smith missed bsyncs
5a304ef1
molly-smith Merge branch 'master' into mosm/softmax
936e3f0c
molly-smith precommit
5f71f179
molly-smith molly-smith marked this pull request as ready for review 2 years ago
molly-smith molly-smith requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 2 years ago
molly-smith molly-smith requested a review from awan-10 awan-10 2 years ago
molly-smith molly-smith requested a review from jeffra jeffra 2 years ago
molly-smith molly-smith requested a review from cmikeh2 cmikeh2 2 years ago
molly-smith molly-smith requested a review from arashb arashb 2 years ago
cmikeh2
cmikeh2 approved these changes on 2023-04-04
molly-smith Merge branch 'master' into mosm/softmax
572eb28a
molly-smith molly-smith enabled auto-merge (squash) 2 years ago
molly-smith molly-smith merged e73de8ce into master 2 years ago
molly-smith molly-smith deleted the mosm/softmax branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone