Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321
As title.
Reviewed By: jspark1105
Differential Revision: D14575512
fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d