onnxruntime
22543919 - Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833)

Commit
4 years ago
Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833) * special case concat and split when sizes are equal * add tests for 16 and 32 inputs with same dim * add tests for 16/64 inputs on concat or 16/64 outputs on split * try eliminate windows warning * outter => outer
Author
Suffian Khan
Parents
Loading