Fix max_length criteria when using inputs_embeds (#28994)
* fix max_length for inputs_embeds
* make style
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Static Cache: load models with MQA or GQA (#28975)
* fix
* fix tests
* fix tests
* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* more fixes
* make style
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>