enable torch.cpu.amp.autocast (#57386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57386
Here is the PR for what's discussed in the RFC https://github.com/pytorch/pytorch/issues/55374 to enable the autocast for CPU device. Currently, this PR only enable BF16 as the lower precision datatype.
Changes:
1. Enable new API `torch.cpu.amp.autocast` for autocast on CPU device: include the python API, C++ API, new Dispatchkey etc.
2. Consolidate the implementation for each cast policy sharing between CPU and GPU devices.
3. Add the operation lists to corresponding cast policy for cpu autocast.
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D28572219
Pulled By: ezyang
fbshipit-source-id: db3db509973b16a5728ee510b5e1ee716b03a152