DeepSpeed
cfc6ed37 - bf16_optimizer: fixes to different grad acc dtype (#6485)

Commit
1 year ago
bf16_optimizer: fixes to different grad acc dtype (#6485) - fix step function to cast to FP32 before step in case of different gradient accumulation data type - remove redundatn function initialize_optimizer_states()
Author
Parents
Loading