llama.cpp
dc685be4 - CUDA: add FP32 FlashAttention vector kernel (#7188)

Commit

1 year ago

CUDA: add FP32 FlashAttention vector kernel (#7188) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

References

#7188 - CUDA: add FP32 FlashAttention vector kernel

Author

JohannesGaessler

Parents

6f1b6360

Files9

ggml-cuda.cu
ggml-cuda
- common.cuh
- fattn-common.cuh
- fattn-vec-f16.cu
- fattn-vec-f16.cuh
- fattn-vec-f32.cu
- fattn-vec-f32.cuh
- fattn.cu
tests
- test-backend-ops.cpp

llama.cpp dc685be4 - CUDA: add FP32 FlashAttention vector kernel (#7188)

llama.cpp
dc685be4 - CUDA: add FP32 FlashAttention vector kernel (#7188)