llama.cpp
f3f62f0d - metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)

Commit

2 years ago

metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725) * ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id

References

#4725 - metal : optimize ggml_mul_mat_id (faster Mixtral PP)

Author

ggerganov

Parents

0ef3ca2a

llama.cpp f3f62f0d - metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)

llama.cpp
f3f62f0d - metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725)