ggml : use 8-bit precision for Q4_1 intermediate results (#1047)
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)
* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32
56 ms/token with Q4_1 !
* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)
* gitignore : ignore ppl-*.txt files
---------
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>