llama.cpp
b2acedeb - cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

Commit

2 years ago

cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

References

#4312 - llama : support quantum K cache

Author

ggerganov

ggerganov

Parents

Loading