Fix the initialization of the cache when we have multi gpu (#33303)
* init cache multi-gpu
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* switch to execution device map
* naming more consistant
* fix
* mutually exclusive device
* added an integration example
* remove useless check
* suggestion from joao + typing
* fix couple of typo and add test
* revert check
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>