llama.cpp
metal: SSM_SCAN performance
#14743
Merged

Commits
  • feat: Add s_off as a parameter in the args struct
    gabe-l-hart committed 268 days ago
  • perf: Parallelize mamba2 SSM_SCAN metal kernel over d_state
    gabe-l-hart committed 268 days ago
  • fix: Update logic to correctly do the multi-layer parallel sum
    gabe-l-hart committed 267 days ago
  • fix: Correctly size the shared memory bufer and assert expected size relationships
    gabe-l-hart committed 267 days ago
  • refactor: Compute block offsets once rather than once per token
    gabe-l-hart committed 267 days ago
  • feat: Use local variable for state recursion
    gabe-l-hart committed 264 days ago
  • feat: Use a secondary simd_sum instead of a for loop
    gabe-l-hart committed 263 days ago
  • feat: Add assertion and comment about relationship between simd size and num simd groups
    gabe-l-hart committed 263 days ago
  • feat: Parallelize of d_state for mamba-1
    gabe-l-hart committed 263 days ago
  • feat: Parallel sum in SSM_CONV
    gabe-l-hart committed 263 days ago
  • Revert "feat: Parallel sum in SSM_CONV"
    gabe-l-hart committed 262 days ago
  • Merge remote-tracking branch 'origin/master' into GraniteFourPerf
    gabe-l-hart committed 262 days ago
  • Merge remote-tracking branch 'origin/master' into GraniteFourPerf
    gabe-l-hart committed 260 days ago
  • refactor: Simplify shared memory sizing
    gabe-l-hart committed 260 days ago
Loading