llama.cpp
444f00b0 - llama : remove quantization sanity check (#17788)

Commit

82 days ago

llama : remove quantization sanity check (#17788) * llama : remove quantization sanity check This commit removes the quantization sanity check for attention layers. The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures. * llama : remove unused pruned_attention_w and is_clip_model vars

References

#17788 - llama : remove quantization sanity check

Author

danbev

Parents

2960eb29

llama.cpp 444f00b0 - llama : remove quantization sanity check (#17788)

llama.cpp
444f00b0 - llama : remove quantization sanity check (#17788)