pytorch
66979fbf - Improve complex lerp performance (#84844)

Commit
3 years ago
Improve complex lerp performance (#84844) The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84844 Approved by: https://github.com/ngimel
Author
Committer
Parents
Loading