[AARCH64] Hide FP16 scalar arithmetic behind proper feature flag (#122204)
On Apple Silicon:
```
% sysctl machdep.cpu.brand_string; clang -dM -E - < /dev/null|grep __ARM_FEATURE_FP16
machdep.cpu.brand_string: Apple M1
#define __ARM_FEATURE_FP16_FML 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
```
On Graviton2 with respective `-march` flag:
```
# ./cpuinfo/build/cpu-info |grep Microarch -A1; gcc -dM -E - -march=armv8.2-a+fp16 </dev/null | grep __ARM_FEATURE_FP16
Microarchitectures:
8x Neoverse N1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
```
Test Plan: CI
Reviewed By: dimitribouche
Differential Revision: D55033347
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122204
Approved by: https://github.com/huydhn