Fix `Vectorize<float>::trunc` on ARM platform (#59858)
Summary:
Use `vrndq_f32`, which corresponds to `VRINTZ` instruction, which rounds floating point value towards zero, which matches `std::trunc` behaviour.
This makes trunc implementation correct even for values that fit into float32, but can not be converted to int32, for example `-1.0e+20`, see the following [gist](https://gist.github.com/malfet/c612c9f4b3b5681ca1b2a69930825871):
```
inp= 3.1 2.7 -2.9 -1e+20
old_trunc= 3 2 -2 -2.14748e+09
new_trunc= 3 2 -2 -1e+20
```
Fixes `test_reference_numerics_hard_trunc_cpu_float32` on M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59858
Reviewed By: kimishpatel
Differential Revision: D29052008
Pulled By: malfet
fbshipit-source-id: 6b567f39151538be1aa3890e3b4e1e978e598657