transformers
dc11a3cb - [core] Refactor the Cache logic to make it simpler and more general (#39797)

Commit
258 days ago
[core] Refactor the Cache logic to make it simpler and more general (#39797) * Simplify the logic quite a bit * Update cache_utils.py * continue work * continue simplifying a lot * style * Update cache_utils.py * offloading much simpler * style * Update cache_utils.py * update inits * Update cache_utils.py * consistemncy * Update cache_utils.py * update generate * style * fix * fix * add early_initialization * fix * fix mamba caches * update * fix * fix * fix * fix tests * fix configs * revert * fix tests * alright * Update modeling_gptj.py * fix the constructors * cache tests * Update test_cache_utils.py * fix * simplify * back to before -> avoid compile bug * doc * mistral test * llama4 test dtype * Update test_modeling_llama4.py * CIs * Finally find a nice impl * Update cache_utils.py * Update cache_utils.py * add lazy methods in autodoc * typo * better doc * Add detailed docstring for lazy init * CIs * style * fix
Author
Parents
Loading