DeepSpeed
407708cd - add support for tensor learning rate (vs scalar) (#7633)

Commit
74 days ago
add support for tensor learning rate (vs scalar) (#7633) This change is intended to help enable support for using a tensor learning rate value vs a scalar ones. We found this helpful in cases where the optimizer is torch.compiled (in such cases changing the scalar LR value could cause recompilation degrading the performance). The implementation allows the model script to determine the type of LR value used by setting the initial value. Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Parents
Loading