Construct `c10::Half` from `float16_t` on ARMv8 (#120425)
By hiding float32 constructors and exposing float16 ones. This allows compiler do implicit conversions as needed, and in safe cases optimize out unneeded upcasts to fp32, see example [below](https://godbolt.org/z/5TKnY4cos)
```cpp
#include <arm_neon.h>
#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
#error Ieeee
#endif
float16_t sum1(float16_t x, float16_t y) {
return x + y;
}
float16_t sum2(float16_t x, float16_t y) {
return static_cast<float>(x) + static_cast<float>(y);
}
```
both sum variants are compiled to scalar fp16 add, if build for the platform that supports fp16 arithmetic
```
sum1(half, half): // @sum1(half, half)
fadd h0, h0, h1
ret
sum2(half, half): // @sum2(half, half)
fadd h0, h0, h1
ret
```
Fixes build error in some aarch64 configurations after #119483 which are defined as supporting FP16 but don't define _Float16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120425
Approved by: https://github.com/mikekgfb, https://github.com/atalman, https://github.com/snadampal