CUDA: faster softmax via shared memory + fp16 math #4742
CUDA: faster softmax via shared memory + fp16 math
64c46fc6
fixup! CUDA: faster softmax via shared memory + fp16 math
ae26053d
fixup! fixup! CUDA: faster softmax via shared memory + fp16 math
e1936bb5
fixup! CUDA: faster softmax via shared memory + fp16 math
44f30434
fixup! CUDA: faster softmax via shared memory + fp16 math
5d64a0c0
slaren
approved these changes
on 2024-01-09
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub