llama.cpp
a7b47156 - cuda : switch to 1 warp for bs > 16

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

cuda : switch to 1 warp for bs > 16

References

#5021 - ggml : add Flash Attention

Author

ggerganov

ggerganov

Parents

FAQ Terms Privacy Refunds Impressum

Loading