ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support (#900)
* ggml : speed-up q4_1 ARM_NEON by ~5%
* ggml : implement vaddvq when missing
* ggml : implement vminvq and vmaxvq when missing
* ggml : implement vzip when missing
* ggml : fix comment
* ggml : try to use correct ifdef