llama.cpp
CUDA: faster softmax via shared memory + fp16 math
#4742
Merged

CUDA: faster softmax via shared memory + fp16 math #4742

JohannesGaessler
slaren
JohannesGaessler JohannesGaessler force pushed 2 years ago
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed to 99407a86 2 years ago
slaren
JohannesGaessler
slaren
kalomaze
JohannesGaessler
slaren
JohannesGaessler CUDA: faster softmax via shared memory + fp16 math
64c46fc6
JohannesGaessler JohannesGaessler force pushed from 99407a86 to 64c46fc6 2 years ago
JohannesGaessler
slaren
JohannesGaessler
slaren
FSSRepo
JohannesGaessler fixup! CUDA: faster softmax via shared memory + fp16 math
ae26053d
JohannesGaessler fixup! fixup! CUDA: faster softmax via shared memory + fp16 math
e1936bb5
JohannesGaessler
JohannesGaessler fixup! CUDA: faster softmax via shared memory + fp16 math
44f30434
JohannesGaessler
JohannesGaessler
slaren
bobqianic
JohannesGaessler
ggerganov
bobqianic
FSSRepo
JohannesGaessler
FSSRepo
slaren
JohannesGaessler fixup! CUDA: faster softmax via shared memory + fp16 math
5d64a0c0
JohannesGaessler
slaren
slaren approved these changes on 2024-01-09
JohannesGaessler JohannesGaessler merged 8f900abf into master 2 years ago
LostRuins
JohannesGaessler
henk717
JohannesGaessler
LostRuins
cebtenzzre
JohannesGaessler
JohannesGaessler
JohannesGaessler
LostRuins
LostRuins

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone