llama.cpp
444f00b0 - llama : remove quantization sanity check (#17788)

Commit
12 days ago
llama : remove quantization sanity check (#17788) * llama : remove quantization sanity check This commit removes the quantization sanity check for attention layers. The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures. * llama : remove unused pruned_attention_w and is_clip_model vars
Author
Parents
Loading