llama.cpp
a3c28439 - cuda : fine-tune >= VOLTA params + use MMQ only for small batches

Commit

2 years ago

cuda : fine-tune >= VOLTA params + use MMQ only for small batches

References

#3776 - cuda : improve text-generation and batched decoding performance

Author

ggerganov

ggerganov

Parents

Loading