llama.cpp
b958151e - cuda : use half2 in softmax

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

2 years ago

cuda : use half2 in softmax

References

#5021 - ggml : add Flash Attention

Author

ggerganov

ggerganov

Parents

FAQ Terms Privacy Refunds Impressum

Loading