llama.cpp
95b8b8ec - metal: template GLU kernels to support f16/f32 (#23882)

Commit
2 days ago
metal: template GLU kernels to support f16/f32 (#23882) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in geglu/swiglu. Also opened up the dispatch gate to allow f16 inputs.
Parents
Loading