Self-speculation (Layer-Skip Llama) (#34240)
* 😅
* early exit (#34244)
* mvp
* docs and tests
* a few fixes
* no shared cache
* Apply suggestions from code review
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
* docs
* make fix-copies
* cohere fix
* [test all]
* [test all] consistent model code copies
* [test all] make fix-copies :D
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
* Update src/transformers/generation/candidate_generator.py
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* [test all] don't use a stand-alone attribute; fix test
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>