set the default to use set_to_none for clearing gradients in BF16 optimizer. (#5434)
as discussed in #5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.
FYI @loadams
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>