Fix BLT training_ci overfit test (#42685)
* Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds
* Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds
* Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds
* Format BLT tests with ruff
* Fix BLT training CI with custom weight initialization and overfit test
* Fix BLT training CI with custom weight initialization and overfit test
* Fix BLT training CI with custom weight initialization and overfit test
* Fix BLT training CI with custom weight initialization and overfit test
* Fix BLT training CI with custom weight initialization and overfit test
* Fix BLT training CI with custom weight initialization and overfit test
* Update BLT init logic and adjust repo checks for non-functional model wrappers
* Fix repo/config checks by marking BLT Text/Vision models as placeholders
* Fix repo/config checks by marking BLT Text/Vision models as placeholders
* Fix repo/config checks by marking BLT Text/Vision models as placeholders
* Document BLT weight initialization sources and restore default overfit thresholds
* Align BLT weight init with nn.init
* Fix BLT init weights and remove modular conversion issues
* fixes circle ci failures
* fix
* fix
* fix recurrent_gemma overfit generation with cache
* Fix recurrent_gemma overfit generation with cache
* rerun circleci
* rerun circleci
* Log RecurrentGemma cache exception in training mixin
* ci: rerun
* ci: rerun
* ci: rerun
---------
Co-authored-by: Ferdinand Mom <47445085+3outeille@users.noreply.github.com>