llama.cpp
2e460137 - cuda : fix soft_max to use correct mask size

Commit

1 year ago

cuda : fix soft_max to use correct mask size

References

#5021 - ggml : add Flash Attention

Author

ggerganov

ggerganov

Parents

Loading