add option to force amp to use bfloat16 instead of float16 (#169449)
Summary:
For models which hardcode bf16 like modded-nanogpt
Tested on `python benchmarks/dynamo/torchbench.py --performance --training --amp --backend inductor --device cuda --only modded_nanogpt --disable-cudagraphs` w/ https://github.com/pytorch/benchmark/pull/2660
X-link: https://github.com/pytorch/pytorch/pull/169449
Approved by: https://github.com/Lucaskabela
Reviewed By: huydhn
Differential Revision: D88319178
fbshipit-source-id: 684cfcf47cf67f735556904c61d6c7d7ba8cb726