transformers
0f97c688 - Fix BLT training_ci overfit test (#42685)

Commit
4 days ago
Fix BLT training_ci overfit test (#42685) * Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds * Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds * Fix BLT training_ci overfit test by disabling cache and adjusting training thresholds * Format BLT tests with ruff * Fix BLT training CI with custom weight initialization and overfit test * Fix BLT training CI with custom weight initialization and overfit test * Fix BLT training CI with custom weight initialization and overfit test * Fix BLT training CI with custom weight initialization and overfit test * Fix BLT training CI with custom weight initialization and overfit test * Fix BLT training CI with custom weight initialization and overfit test * Update BLT init logic and adjust repo checks for non-functional model wrappers * Fix repo/config checks by marking BLT Text/Vision models as placeholders * Fix repo/config checks by marking BLT Text/Vision models as placeholders * Fix repo/config checks by marking BLT Text/Vision models as placeholders * Document BLT weight initialization sources and restore default overfit thresholds * Align BLT weight init with nn.init * Fix BLT init weights and remove modular conversion issues * fixes circle ci failures * fix * fix * fix recurrent_gemma overfit generation with cache * Fix recurrent_gemma overfit generation with cache * rerun circleci * rerun circleci * Log RecurrentGemma cache exception in training mixin * ci: rerun * ci: rerun * ci: rerun --------- Co-authored-by: Ferdinand Mom <47445085+3outeille@users.noreply.github.com>
Author
Parents
Loading