pytorch
3ec71fce - Improve make_tensor performance for float and complex types (#85473)

Commit
2 years ago
Improve make_tensor performance for float and complex types (#85473) For floating types, `make_tensor` calls `rand` and then does a linear interpolation from `low` to `high`. This instead calls `uniform_(low, high)` to cut out the interpolation step. For complex types, `make_tensor` does the `rand` + interpolation step twice and calls `torch.complex(real, imag)` at the end. This instead uses `view_as_real` and `uniform_(low, high)` to fuse it all into one operation. My benchmarks show significant speedups in all cases for float32 and complex64. | Device | dtype | Size | Master (us) | This PR (us) | Speedup | |--------|-----------|-------|-------------|--------------|---------| | CPU | float32 | 8 | 19.4 | 6.34 | 3.1 | | | | 4096 | 36.8 | 21.3 | 1.7 | | | | 2**24 | 167,000 | 80,500 | 2.1 | | | complex32 | 8 | 37.0 | 7.57 | 4.9 | | | | 4096 | 73.1 | 37.6 | 1.9 | | | | 2**24 | 409,000 | 161,000 | 2.5 | | CUDA | float32 | 8 | 40.4 | 11.7 | 3.5 | | | | 4096 | 38.7 | 11.7 | 3.3 | | | | 2**24 | 2,300 | 238 | 9.7 | | | complex32 | 8 | 78.7 | 14 | 5.6 | | | | 4096 | 82.7 | 13.8 | 6.0 | | | | 2**24 | 5,520 | 489 | 11.3 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/85473 Approved by: https://github.com/mruberry
Author
Committer
Parents
Loading