metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>