llama.cpp
CUDA: faster softmax via shared memory + fp16 math
#4742

Merged

CUDA: faster softmax via shared memory + fp16 math #4742

JohannesGaessler merged 5 commits into ggml-org:master from JohannesGaessler:cuda-faster-softmax

JohannesGaessler force pushed 2 years ago

JohannesGaessler force pushed to 99407a86 2 years ago

CUDA: faster softmax via shared memory + fp16 math

64c46fc6

JohannesGaessler force pushed from 99407a86 to 64c46fc6 2 years ago

fixup! CUDA: faster softmax via shared memory + fp16 math

ae26053d

fixup! fixup! CUDA: faster softmax via shared memory + fp16 math

e1936bb5

fixup! CUDA: faster softmax via shared memory + fp16 math

44f30434

fixup! CUDA: faster softmax via shared memory + fp16 math

5d64a0c0

slaren approved these changes on 2024-01-09

JohannesGaessler merged 8f900abf into master 2 years ago

Reviewers

slaren

Assignees

No one assigned

Labels

None yet

Milestone

No milestone