Fix static generation when compiling! (#28937)
* wow I was scared!
* fix everything
* nits
* make it BC?
* add todo
* nits
* is_tracing should still be used to pass tracing tests
* nits
* some nits to make sure genration works with static cache uncompiled
* fix sdpa
* fix FA2 for both static and dynamic in a better way?
* style
* fix-copies
* fix fix copies
* fix sequential beam searcg
* style
* use `keys_to_ignore`
* nit
* correct dtype inference when init
* :( the fix for FA2 is still not optimal to investigate!
* styling
* nits
* nit
* this might work better
* add comment
* Update src/transformers/models/llama/modeling_llama.py
* "position_ids" -> "cache_position"
* style
* nit
* Remove changes that should no be propagatted just yet
* Apply suggestions from code review
* Styling
* make sure we raise an errir for static cache with FA2 enabled
* move to the bottom of the signature
* style
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_llama.py
* nit in the name
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>