whisper.cpp
1b241b87 - hexagon: minor refresh for HMX FA and MM (llama/23796)

Commit
5 days ago
hexagon: minor refresh for HMX FA and MM (llama/23796) * hex-fa: clean up qf32/fp32 handling and stride handling * hex-fa: fix corner case fp NAN issues that were cause bad output from gemma4 on v79 * hex-fa: vectorize leftover handling * hex-fa: avoid HVX fallback during token gen HMX has more FP16 compute capacity * hmx-mm: remove dead code * hmx-mm: use fastdiv in x4x2 dequant * hmx-mm: sandwich dequant and scatter to improve perf * hmx-mm: fixed rebase conflicts * hmx-mm: further improve weight dequant by doing early type dispatch and precomputing fastdiv * hmx-mm: an even earlier dispatch for per-type dequant * hmx-mm: dequant linear types like q4_0 and q4_1 without the LUTs This is a bit faster than LUT. * hex-cmake: one more tweak for lto --------- Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>
Committer
Parents
Loading