onnxruntime
53ee6c54 - [CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914)

Commit
208 days ago
[CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914) ### Description Implement fpA intB gemm preprocess in cuda kernel to speed up weight prepacking. ### Motivation and Context Original preprocess code (in https://github.com/microsoft/onnxruntime/pull/24854) is for CPU, which is slow and need extra memory copy between CPU and GPU.
Author
Parents
Loading