llama.cpp
4c72ab13 - metal : use mm kernels for batch size > 2

Commit

2 years ago

metal : use mm kernels for batch size > 2

References

#3228 - llama : custom attention mask + parallel decoding + no context swaps

Author

ggerganov

ggerganov

Parents

Loading