[mistral] Support passing `head_dim` through config (and do not require `head_dim * num_heads == hidden_size`) (#32050)
* Allow `head_dim` to be set in Mistral config
* Add docstring
* Do not require `head_dim * num_heads == hidden_size`
* [run-slow] mistral