llama.cpp
10e5b148 - llama-quant : correct `n_attention_wv` usage (#20357)

Commit

65 days ago

llama-quant : correct `n_attention_wv` usage (#20357) * llama-quant : correct `n_attention_wv` usage In #19770, I introduced a regression in the way the `quantize_state_impl` counter values were initialized. I was incrementing and using `n_attention_wv` in the same loop, when it should have been fixed by the time we're deciding tensor types in `llama_tensor_get_type_impl` (for `use_more_bits`). I never observed a difference in any of [my tests](https://github.com/ggml-org/llama.cpp/pull/19770#issuecomment-4000424712) - it was only after @bartowski kindly pointed this out that I realized it was incorrect. (Thanks!) * simplify

References

#20357 - llama-quant : correct `n_attention_wv` usage

Author

ddh0

Parents

90b27318

llama.cpp 10e5b148 - llama-quant : correct `n_attention_wv` usage (#20357)

llama.cpp
10e5b148 - llama-quant : correct `n_attention_wv` usage (#20357)