DeepSpeed
98272d14 - [bugfix] promote state in bf16_optimizer (#5767)

Commit

1 year ago

[bugfix] promote state in bf16_optimizer (#5767) This patch is to promote state in bf16_optimizer so it can be accessible in downstream deepspeed usecase. For example, without the patch, we found issue in megatron-deepspeed llama showcase: ``` [rank3]: Traceback (most recent call last): [rank3]: File "/yahao/Megatron-DeepSpeed/pretrain_gpt.py", line 356, in <module> [rank3]: pretrain(train_valid_test_datasets_provider, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 222, in pretrain [rank3]: iteration = train(forward_step_func, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 1264, in train [rank3]: report_memory_flag = training_log(loss_dict, total_loss_dict, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 999, in training_log [rank3]: opt_stats[0] += (torch.norm(optimizer.state[param]['exp_avg_sq']).item())**2 [rank3]: AttributeError: 'BF16_Optimizer' object has no attribute 'state' ``` With the patch, the invocation can pass smoothly. Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

References

#5767 - [bugfix] promote state in bf16_optimizer

Author

billishyahao

Parents

61e07786

DeepSpeed 98272d14 - [bugfix] promote state in bf16_optimizer (#5767)

DeepSpeed
98272d14 - [bugfix] promote state in bf16_optimizer (#5767)