llama.cpp
Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention
#12032

Merged

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention #12032

IMbackK merged 14 commits into ggml-org:master from pr

Add GGML_HIP_ROCWMMA_FATTN and rocwmma header check

206d22bd

Add rocWMMA support

02369da4

hjc4869 requested a review from

JohannesGaessler 1 year ago

github-actions added Nvidia GPU

github-actions added ggml

Merge branch 'master' into pr

547115da

JohannesGaessler commented on 2025-02-23

Update ggml/src/ggml-hip/CMakeLists.txt

419f1ea9

Move comments to reduce confusion.

828577a9

Use namespace alias `wmma` instead of lots of ifdefs.

9d27c38b

Fix: FP16_MMA_AVAILABLE should not be checked in host code.

19272bfa

JohannesGaessler commented on 2025-02-25

Always return false in `fp16_mma_available` when compiling for HIP an…

29debe14

Remove the Q->ne[1] > 8 check

5d4ab04c

Also always return false in fp16_mma_hardware_available when compiled…

55169095

JohannesGaessler commented on 2025-02-25

Revert "Also always return false in fp16_mma_hardware_available when …

fea171f5

IMbackK assigned

IMbackK 1 year ago

ggml: Make fattn use hardware warp size instead of 32

a90f4cb7

ggml: Make fattn kernel use launch bounds w/HIP

a135b4c7

IMbackK requested changes on 2025-03-03

Use GGML_CUDA_CC_IS_CDNA for checking CDNA architectures.

373d48ef

hjc4869 requested a review from

IMbackK 1 year ago

IMbackK approved these changes on 2025-03-03

IMbackK merged becade5d into master 1 year ago

hjc4869 deleted the pr branch 1 year ago

JohannesGaessler commented on 2025-03-06

Reviewers

IMbackK

JohannesGaessler

Assignees

IMbackK

Labels

Nvidia GPU ggml

Milestone

No milestone

llama.cpp Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention #12032 Merged

Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention #12032

llama.cpp
Add GGML_HIP_ROCWMMA_FATTN to enable rocWMMA for FlashAttention
#12032

Merged