onnxruntime
53ee6c54 - [CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914)

Commit

233 days ago

[CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914) ### Description Implement fpA intB gemm preprocess in cuda kernel to speed up weight prepacking. ### Motivation and Context Original preprocess code (in https://github.com/microsoft/onnxruntime/pull/24854) is for CPU, which is slow and need extra memory copy between CPU and GPU.

References

#24914 - [CUDA] FpA IntB Gemm Weight Conversion in GPU

Author

tianleiwu

Parents

03b22ffc

onnxruntime 53ee6c54 - [CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914)

onnxruntime
53ee6c54 - [CUDA] FpA IntB Gemm Weight Conversion in GPU (#24914)