Speedup copysign for half and bfloat16 types (#47413)
Summary:
This also avoids internal compiler error exceptions on aarch64 platforms and transitively fixes https://github.com/pytorch/pytorch/issues/47395
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47413
Reviewed By: walterddr
Differential Revision: D24745921
Pulled By: malfet
fbshipit-source-id: 790e5b91d9116670c882d838b3862d5b47178d68