HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624
Feat: Enable MFMA instr for Q4_K
68da4e55
Fix: Missed template param
79f348a0
Feat: Add MFMA instr for Q6_K, remove MMQ_NWARPS
89ba8a6d
Merge branch 'ggml-org:master' into amd-integration
e57e5639
Merge branch 'ggml-org:master' into amd-integration
9784a513
Merge branch 'ggml-org:master' into amd-integration
dad79b35
Perf: Fix Register Spilling Q6_K - Refactor kernel, launch_bound
ff60fa9d
Perf: Refactor Q4_K, reduce register pressure
e8eeb344
Perf: Throughput Increase 4k->6.9k t/s
a1619007
Perf: 7.1k tokens/sec
75d386af
Perf/Feat: Throughput 8.3k tokens/sec, Add support for all quants
0215a802
Feat: Remove warnings, deprecated __AMDGCN_WAVEFRONT_SIZE
aa35febd
Merge branch 'master' into amd-integration
ba17f62e
Feat: Enable stream-k for CDNA3
5ab14910
Fix: Remove Trailing Whitespaces
fb2fd314
Fix: Unused Params Warnings, CUDA Build
b55d44a7
-p512: 8.4k->9.5k - Account for DataPadding for writing tile_y
ab7c0072
refactor: PR code cleanup, amd_mma_available->amd_mfma_available
279b51e0
refactor: PR cleanup
a2a336b4
Feat/Perf: redesign all quants use same tile size, mfma instr, nwarps
b489b4e2
Fix: CI fail - unused parameter Werror
4e6d54f6
IMbackK
approved these changes
on 2025-07-26
IMbackK
merged
66906cd8
into master 104 days ago
Labels
Nvidia GPU
devops
ggml
Login to write a write a comment.
Login via GitHub