[WIP] Add support for Mistral-Nemo by supporting head_dim through config (#2254)
* Support passing head_dim through config
* Using `head_dim` as a fallback is necessary since it's a non standard
key in mistralConfig (as defined in transformers).
* Shorter diff.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>