feat: adjust attn weight loading logic (#1975)

Commit

1 year ago

feat: adjust attn weight loading logic (#1975) This PR updates `load_attention` to prefer loading specific attention based on the model type. Additionally there were two cases where `TensorParallelColumnLinear.load_multi` was called and this reduces it to a single path