HIP: add mmf for CDNA (#18896)
* refactor mmf rows_per_block
* speed up compile
* pass cdna compile
* fix cuda error
* clean up mmf
* f32 mmf
* clean float mma
* fix mmf error
* faster mmf
* extend tile k
* fix compile error
* Revert "extend tile k"
This reverts commit 4d2ef3d483932659801a59a5af0b6b48f6ffd5c7.
* fix smem overflow
* speed up compiling mmf
* speed up compile for hip
* 512 block for cdna
* config pad size
* fix as comment
* update select logic
* move some code to cuh
* fix as comment
* correct cdna3 config
---------
Co-authored-by: zhang hui <you@example.com>