transformers
5a1016a9 - Remove unconditional train_batch_size assignment (#43770)

Commit

17 hours ago

Remove unconditional train_batch_size assignment (#43770) * Remove unconditional train_batch_size assignment The train_batch_size should only be saved to TrainerState when auto_find_batch_size is enabled (which is already handled in the auto_find_batch_size block at line 2251). The unconditional assignment caused issues when resuming from checkpoint with different batch configurations. Fixes #43708 * Add test for train_batch_size not saved without auto_find_batch_size * Only restore train_batch_size from checkpoint when auto_find_batch_size is enabled Fixes #43708 When resuming from a checkpoint, the trainer was unconditionally restoring the saved train_batch_size, overwriting the user's current batch size configuration. This caused incorrect max_steps calculation when users wanted to resume training with a different batch size. Now the checkpoint's train_batch_size is only restored when auto_find_batch_size=True, as that feature specifically needs to resume with the automatically-found batch size. Otherwise, the user's current args batch size is used. Added test to verify users can change batch size when resuming.

References

#43770 - Remove unconditional train_batch_size assignment

Author

lordaarush

Parents

711f2797

transformers 5a1016a9 - Remove unconditional train_batch_size assignment (#43770)

transformers
5a1016a9 - Remove unconditional train_batch_size assignment (#43770)