text-generation-inference
cbced7f0 - feat: adjust attn weight loading logic (#1975)

Commit
1 year ago
feat: adjust attn weight loading logic (#1975) This PR updates `load_attention` to prefer loading specific attention based on the model type. Additionally there were two cases where `TensorParallelColumnLinear.load_multi` was called and this reduces it to a single path
Author
Parents
Loading