[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" (#24769)
* fix: half inference error
norm_factor is still torch.float32 after using model.half
So I changed it to register_buffer so I can change it to torch.float16 after using model.half
* fix: Added a variable "persistent=False"
* run make style
* [fix] Change the condition of ValueError
convert_checkpoint_from_transformers_to_megatron
* [fix] error wording
layers -> attention heads