ggml : add some lsx support (#23798)
* loongarch : optimize LSX fp16 load/store with native intrinsics
Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in
__lsx_f16x4_load and __lsx_f16x4_store.
* loongarch : add LSX implementation for q8_0 dot product
* loongarch : add LSX implementation for q6_K dot product
* loongarch : add LSX implementation for iq4_xs dot product
* Improve reduce ops when sun int16 pairs to int32