Optimize Softmax Kernel #3112
Simplify kernel
473b7640
Coalesce memory attempt 1. Logits divergence.
5a49cecd
Logits fix?
0e949b9d
sync after every global mem access
b51551c8
template on iterations. Down to 8.3% cuda time for 8k tokens
0260509e
Up to 64 iterations
fd803088
Add alibi/mask check
2d7d1749
fp32
b7ad2a2d
Revert builder.py
bc450d48
naming. precommit
150eb7d9
Revert "naming. precommit"
de998223
naming. spacing
77233e0f
Spacing. simplify checks
9b2697ea
remove bsyncs
386b14aa
missed bsyncs
5a304ef1
Merge branch 'master' into mosm/softmax
936e3f0c
precommit
5f71f179
molly-smith
marked this pull request as ready for review 2 years ago
cmikeh2
approved these changes
on 2023-04-04
Merge branch 'master' into mosm/softmax
572eb28a
molly-smith
deleted the mosm/softmax branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub