llama.cpp
Hexagon add support for f16/f32 flash attention, scale, set-rows and improve f16/32 matmul
#18611
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
21
Changes
View On
GitHub
Commits
hexagon: improve fp16 matmul and add fp32/fp16 flash-attention
max-krasnyansky
committed
31 days ago
hexagon: add support for set-rows fp32 -> fp16 with i32/i64 row-idx
max-krasnyansky
committed
31 days ago
hexagon: add support for SCALE fp32
max-krasnyansky
committed
31 days ago
hexagon: replace scalar fp32 -> fp16 copy with HVX
max-krasnyansky
committed
31 days ago
hexagon: optimize flash_atten_ext with aligned VTCM buffers and DMA
max-krasnyansky
committed
31 days ago
hexagon: use aligned mad_f16
max-krasnyansky
committed
31 days ago
hexagon: flash_atten more aligned ops
max-krasnyansky
committed
31 days ago
hexagon: optimize scale_f32 hvx helpers
max-krasnyansky
committed
31 days ago
hexagon: unroll fa loops
max-krasnyansky
committed
31 days ago
hexagon: remove unused set-rows log
max-krasnyansky
committed
31 days ago
hexagon: flash_attn_ext add support for DMAing Q
max-krasnyansky
committed
31 days ago
hexagon: fix handling of NANs hvx dotproducts
max-krasnyansky
committed
31 days ago
hexagon: cleanup spad allocation in flash-atten
max-krasnyansky
committed
31 days ago
hexagon: improve fp16/fp32 matmul
max-krasnyansky
committed
31 days ago
hexagon: fix HVX_ARCH check
max-krasnyansky
committed
31 days ago
hexagon: matmul cleanup and fp16 fixes
max-krasnyansky
committed
31 days ago
hexagon: fix fp16 x fp16 matmuls and some minor refactoring
max-krasnyansky
committed
31 days ago
hexagon: add support for GET_ROWS f32 -> f32
max-krasnyansky
committed
31 days ago
hexagon: optimize set-rows threading
max-krasnyansky
committed
31 days ago
hexagon: update adb/run-bench.sh to properly support experimental and verbose options
max-krasnyansky
committed
31 days ago
hexagon: flash_atten use aligned vectors for dot products
max-krasnyansky
committed
31 days ago
Loading