llama.cpp
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul
#18611
Merged

Commits
  • hexagon: improve fp16 matmul and add fp32/fp16 flash-attention
    max-krasnyansky committed 31 days ago
  • hexagon: add support for set-rows fp32 -> fp16 with i32/i64 row-idx
    max-krasnyansky committed 31 days ago
  • hexagon: add support for SCALE fp32
    max-krasnyansky committed 31 days ago
  • hexagon: replace scalar fp32 -> fp16 copy with HVX
    max-krasnyansky committed 31 days ago
  • hexagon: optimize flash_atten_ext with aligned VTCM buffers and DMA
    max-krasnyansky committed 31 days ago
  • hexagon: use aligned mad_f16
    max-krasnyansky committed 31 days ago
  • hexagon: flash_atten more aligned ops
    max-krasnyansky committed 31 days ago
  • hexagon: optimize scale_f32 hvx helpers
    max-krasnyansky committed 31 days ago
  • hexagon: unroll fa loops
    max-krasnyansky committed 31 days ago
  • hexagon: remove unused set-rows log
    max-krasnyansky committed 31 days ago
  • hexagon: flash_attn_ext add support for DMAing Q
    max-krasnyansky committed 31 days ago
  • hexagon: fix handling of NANs hvx dotproducts
    max-krasnyansky committed 31 days ago
  • hexagon: cleanup spad allocation in flash-atten
    max-krasnyansky committed 31 days ago
  • hexagon: improve fp16/fp32 matmul
    max-krasnyansky committed 31 days ago
  • hexagon: fix HVX_ARCH check
    max-krasnyansky committed 31 days ago
  • hexagon: matmul cleanup and fp16 fixes
    max-krasnyansky committed 31 days ago
  • hexagon: fix fp16 x fp16 matmuls and some minor refactoring
    max-krasnyansky committed 31 days ago
  • hexagon: add support for GET_ROWS f32 -> f32
    max-krasnyansky committed 31 days ago
  • hexagon: optimize set-rows threading
    max-krasnyansky committed 31 days ago
  • hexagon: update adb/run-bench.sh to properly support experimental and verbose options
    max-krasnyansky committed 31 days ago
  • hexagon: flash_atten use aligned vectors for dot products
    max-krasnyansky committed 31 days ago
Loading