DeepSpeed
a3cee49b - 1. allow not resetting gpu_sum when calling has_overflow, 2. Clone the param grad before putting it in the swap out gradient buffer since underlying param.grad buffer is reused. 3. Do not return -1 for norm when total norm is NaN Inf. Just return the computed value

Commit

4 years ago

1. allow not resetting gpu_sum when calling has_overflow, 2. Clone the param grad before putting it in the swap out gradient buffer since underlying param.grad buffer is reused. 3. Do not return -1 for norm when total norm is NaN Inf. Just return the computed value

Author

samyam

Parents

1947b38d

DeepSpeed a3cee49b - 1. allow not resetting gpu_sum when calling has_overflow, 2. Clone the param grad before putting it in the swap out gradient buffer since underlying param.grad buffer is reused. 3. Do not return -1 for norm when total norm is NaN Inf. Just return the computed value

DeepSpeed
a3cee49b - 1. allow not resetting gpu_sum when calling has_overflow, 2. Clone the param grad before putting it in the swap out gradient buffer since underlying param.grad buffer is reused. 3. Do not return -1 for norm when total norm is NaN Inf. Just return the computed value