llama.cpp
95b8b8ec - metal: template GLU kernels to support f16/f32 (#23882)

Commit

2 days ago

metal: template GLU kernels to support f16/f32 (#23882) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.

References

#23882 - metal: template GLU kernels to support f16/f32

Author

shrivasshankar

Parents

55ac0909

llama.cpp 95b8b8ec - metal: template GLU kernels to support f16/f32 (#23882)

llama.cpp
95b8b8ec - metal: template GLU kernels to support f16/f32 (#23882)