llama.cpp
metal: SSM_SCAN performance
#14743
Merged

metal: SSM_SCAN performance #14743

gabe-l-hart
gabe-l-hart feat: Add s_off as a parameter in the args struct
ba74a247
gabe-l-hart perf: Parallelize mamba2 SSM_SCAN metal kernel over d_state
8d5a25d3
gabe-l-hart
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
gabe-l-hart gabe-l-hart changed the title metail: SSM_SCAN performance metal: SSM_SCAN performance 51 days ago
gabe-l-hart
gabe-l-hart gabe-l-hart force pushed from 26524d08 to 8d5a25d3 51 days ago
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-07-17
gabe-l-hart fix: Update logic to correctly do the multi-layer parallel sum
e16e24be
gabe-l-hart
gabe-l-hart
gabe-l-hart commented on 2025-07-18
gabe-l-hart
gabe-l-hart commented on 2025-07-18
gabe-l-hart
gabe-l-hart fix: Correctly size the shared memory bufer and assert expected size …
21db0b59
gabe-l-hart gabe-l-hart force pushed from 0817add1 to 21db0b59 51 days ago
gabe-l-hart
gabe-l-hart refactor: Compute block offsets once rather than once per token
a5334f91
gabe-l-hart
compilade
compilade commented on 2025-07-18
gabe-l-hart
gabe-l-hart feat: Use local variable for state recursion
3866f766
gabe-l-hart feat: Use a secondary simd_sum instead of a for loop
641276a8
gabe-l-hart
gabe-l-hart feat: Add assertion and comment about relationship between simd size …
d06d0876
ggerganov
gabe-l-hart feat: Parallelize of d_state for mamba-1
80545ef5
gabe-l-hart
gabe-l-hart feat: Parallel sum in SSM_CONV
16bc0596
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart
gabe-l-hart
slaren
gabe-l-hart
gabe-l-hart
compilade
compilade commented on 2025-07-22
gabe-l-hart Revert "feat: Parallel sum in SSM_CONV"
e55176a0
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFourPerf
f6d5e1ae
ggerganov
ggerganov approved these changes on 2025-07-25
gabe-l-hart Merge remote-tracking branch 'origin/master' into GraniteFourPerf
c3711e1d
gabe-l-hart refactor: Simplify shared memory sizing
d20b02d1
gabe-l-hart
ggerganov
gabe-l-hart gabe-l-hart merged 793c0d7f into master 44 days ago
gabe-l-hart gabe-l-hart deleted the GraniteFourPerf branch 25 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone