[bugfix] promote state in bf16_optimizer (#5767)
This patch is to promote state in bf16_optimizer so it can be accessible
in downstream deepspeed usecase.
For example, without the patch, we found issue in megatron-deepspeed
llama showcase:
```
[rank3]: Traceback (most recent call last):
[rank3]: File "/yahao/Megatron-DeepSpeed/pretrain_gpt.py", line 356, in <module>
[rank3]: pretrain(train_valid_test_datasets_provider,
[rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 222, in pretrain
[rank3]: iteration = train(forward_step_func,
[rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 1264, in train
[rank3]: report_memory_flag = training_log(loss_dict, total_loss_dict,
[rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 999, in training_log
[rank3]: opt_stats[0] += (torch.norm(optimizer.state[param]['exp_avg_sq']).item())**2
[rank3]: AttributeError: 'BF16_Optimizer' object has no attribute 'state'
```
With the patch, the invocation can pass smoothly.
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>