llama.cpp
08e69c50 - cuda : adapt soft_max to F16 mask and pos

Commit

1 year ago

cuda : adapt soft_max to F16 mask and pos

References

#5021 - ggml : add Flash Attention

Author

ggerganov

ggerganov

Parents

Loading