llama.cpp
CUDA: faster softmax via shared memory + fp16 math
#4742
Merged

Commits
  • CUDA: faster softmax via shared memory + fp16 math
    JohannesGaessler committed 2 years ago
  • fixup! CUDA: faster softmax via shared memory + fp16 math
    JohannesGaessler committed 2 years ago
  • fixup! fixup! CUDA: faster softmax via shared memory + fp16 math
    JohannesGaessler committed 2 years ago
  • fixup! CUDA: faster softmax via shared memory + fp16 math
    JohannesGaessler committed 2 years ago
  • fixup! CUDA: faster softmax via shared memory + fp16 math
    JohannesGaessler committed 2 years ago
Loading