llama.cpp
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624
Merged

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

IMbackK merged 21 commits into ggml-org:master from ROCm:amd-integration
deepsek
deepsek Feat: Enable MFMA instr for Q4_K
68da4e55
deepsek Fix: Missed template param
79f348a0
deepsek Feat: Add MFMA instr for Q6_K, remove MMQ_NWARPS
89ba8a6d
deepsek Merge branch 'ggml-org:master' into amd-integration
e57e5639
deepsek Merge branch 'ggml-org:master' into amd-integration
9784a513
deepsek Merge branch 'ggml-org:master' into amd-integration
dad79b35
deepsek Perf: Fix Register Spilling Q6_K - Refactor kernel, launch_bound
ff60fa9d
deepsek Perf: Refactor Q4_K, reduce register pressure
e8eeb344
deepsek Perf: Throughput Increase 4k->6.9k t/s
a1619007
deepsek Perf: 7.1k tokens/sec
75d386af
deepsek Perf/Feat: Throughput 8.3k tokens/sec, Add support for all quants
0215a802
deepsek Feat: Remove warnings, deprecated __AMDGCN_WAVEFRONT_SIZE
aa35febd
deepsek Merge branch 'master' into amd-integration
ba17f62e
deepsek Feat: Enable stream-k for CDNA3
5ab14910
deepsek deepsek requested a review from JohannesGaessler JohannesGaessler 120 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
JohannesGaessler
ggerganov
Dampfinchen
IMbackK
JohannesGaessler
deepsek Fix: Remove Trailing Whitespaces
fb2fd314
IMbackK
deepsek Fix: Unused Params Warnings, CUDA Build
b55d44a7
deepsek -p512: 8.4k->9.5k - Account for DataPadding for writing tile_y
ab7c0072
deepsek deepsek requested a review from ngxson ngxson 116 days ago
github-actions github-actions added devops
deepsek
IMbackK
IMbackK IMbackK assigned IMbackK IMbackK 110 days ago
deepsek
ggerganov
JohannesGaessler
JohannesGaessler commented on 2025-07-21
JohannesGaessler
IMbackK
deepsek
deepsek commented on 2025-07-21
JohannesGaessler
JohannesGaessler
JohannesGaessler commented on 2025-07-21
deepsek refactor: PR code cleanup, amd_mma_available->amd_mfma_available
279b51e0
deepsek
IMbackK
JohannesGaessler
JohannesGaessler commented on 2025-07-22
IMbackK
IMbackK
IMbackK
deepsek
IMbackK
deepsek
0cc4m
IMbackK
JohannesGaessler
deepsek refactor: PR cleanup
a2a336b4
deepsek
JohannesGaessler
IMbackK
IMbackK
JohannesGaessler
IMbackK
deepsek
IMbackK
JohannesGaessler
deepsek Feat/Perf: redesign all quants use same tile size, mfma instr, nwarps
b489b4e2
deepsek
IMbackK
0cc4m
IMbackK
IMbackK
deepsek Fix: CI fail - unused parameter Werror
4e6d54f6
deepsek
IMbackK
slaren
ggml-org ggml-org deleted a comment from deepsek on 2025-07-24
IMbackK
deepsek
JohannesGaessler
JohannesGaessler commented on 2025-07-24
JohannesGaessler
slaren
JohannesGaessler
JohannesGaessler
deepsek
deepsek
JohannesGaessler
deepsek
JohannesGaessler
JohannesGaessler
JohannesGaessler approved these changes on 2025-07-24
JohannesGaessler JohannesGaessler requested a review from IMbackK IMbackK 106 days ago
xbezdick
IMbackK
IMbackK
IMbackK approved these changes on 2025-07-26
IMbackK IMbackK merged 66906cd8 into master 104 days ago
hjc4869
deepsek
JohannesGaessler
IMbackK
he29-net
JohannesGaessler
JohannesGaessler
deepsek
IMbackK

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone