k_quants tuning for Falcon-7b (#2816)

Commit

2 years ago

k_quants tuning for Falcon-7b (#2816) * Make ggml-cuda.cu build with QK_K = 64 Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces a meaningful result. * k_quants tuning for Falcon-7b --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>