llama.cpp
214b6a35 - ggml : adjust mul_mat_f16 work memory (#1226)

Commit
2 years ago
ggml : adjust mul_mat_f16 work memory (#1226) * llama : minor - remove explicity int64_t cast * ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS * ggml : add asserts to guard for incorrect wsize
Author
Parents
Loading