llama.cpp
214b6a35 - ggml : adjust mul_mat_f16 work memory (#1226)

Commit

2 years ago

ggml : adjust mul_mat_f16 work memory (#1226) * llama : minor - remove explicity int64_t cast * ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS * ggml : add asserts to guard for incorrect wsize

References

#1226 - Adjust mul_mat_f16 work memory

Author

ggerganov

Parents

305eb5af

llama.cpp 214b6a35 - ggml : adjust mul_mat_f16 work memory (#1226)

llama.cpp
214b6a35 - ggml : adjust mul_mat_f16 work memory (#1226)