llama.cpp
a972faeb - CUDA: Add mul_mat_id support for the mmf kernel (#15767)

Commit

32 days ago

CUDA: Add mul_mat_id support for the mmf kernel (#15767) * CUDA: Add mul_mat_id support the mmf Add support for mul_mat_id for bs < 16 * Review: use warp_size, fix should_use_mmf condition * Launch one block per expert, stride along n_expert_used * templatize mul_mat_id * Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids * Reduce compile times by dividing mmf into f16, bf16 and f32 variants * Divide mmf by ncols_dst * Add missing files * Fix MUSA/HIP builds

References

#15767 - CUDA: Add mul_mat_id support for the mmf kernel

Author

am17an

Parents

550cf726

llama.cpp a972faeb - CUDA: Add mul_mat_id support for the mmf kernel (#15767)

llama.cpp
a972faeb - CUDA: Add mul_mat_id support for the mmf kernel (#15767)