text-generation-inference
85dfc392 - Add Phi-3 medium support (#2039)

Commit
1 year ago
Add Phi-3 medium support (#2039) Add support for Phi-3-medium The main difference between the medium and mini models is that medium uses grouped query attention with a packed QKV matrix. This change adds support for GQA with packed matrixes to `Weights.get_weights_col_packed` and uses it for Phi-3. This also allows us to remove the custom implementation of GQA from dbrx attention loading.
Author
Parents
Loading