llama.cpp
metal: SSM_SCAN performance
#14743
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
14
Changes
View On
GitHub
Commits
feat: Add s_off as a parameter in the args struct
gabe-l-hart
committed
268 days ago
perf: Parallelize mamba2 SSM_SCAN metal kernel over d_state
gabe-l-hart
committed
268 days ago
fix: Update logic to correctly do the multi-layer parallel sum
gabe-l-hart
committed
267 days ago
fix: Correctly size the shared memory bufer and assert expected size relationships
gabe-l-hart
committed
267 days ago
refactor: Compute block offsets once rather than once per token
gabe-l-hart
committed
267 days ago
feat: Use local variable for state recursion
gabe-l-hart
committed
264 days ago
feat: Use a secondary simd_sum instead of a for loop
gabe-l-hart
committed
263 days ago
feat: Add assertion and comment about relationship between simd size and num simd groups
gabe-l-hart
committed
263 days ago
feat: Parallelize of d_state for mamba-1
gabe-l-hart
committed
263 days ago
feat: Parallel sum in SSM_CONV
gabe-l-hart
committed
263 days ago
Revert "feat: Parallel sum in SSM_CONV"
gabe-l-hart
committed
262 days ago
Merge remote-tracking branch 'origin/master' into GraniteFourPerf
gabe-l-hart
committed
262 days ago
Merge remote-tracking branch 'origin/master' into GraniteFourPerf
gabe-l-hart
committed
260 days ago
refactor: Simplify shared memory sizing
gabe-l-hart
committed
260 days ago
Loading