pytorch
58424446 - use float as accumulate type for reduce Ops: min, max, minmax on CPU (#96079)

Commit
2 years ago
use float as accumulate type for reduce Ops: min, max, minmax on CPU (#96079) Use float32 as acc type for `min`, `max` and `minmax`, in the function ` vec::reduce_all`, float16 inputs will be accumulated in float32. The performance benefit basically comes from the vectorization of `Half` https://github.com/pytorch/pytorch/pull/96076 Tested on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz **single socket** ``` (before) ### using OMP_NUM_THREADS=20 ### using numactl --physcpubind=0-19 --membind=0 max: size: torch.Size([64, 128, 1024]) 2.071 ms (after) ### using OMP_NUM_THREADS=20 ### using numactl --physcpubind=0-19 --membind=0 max: size: torch.Size([64, 128, 1024]) 0.071 ms ``` **single core** ``` (before) ### using OMP_NUM_THREADS=1 ### using numactl --physcpubind=0 --membind=0 max: size: torch.Size([64, 128, 1024]) 33.488 ms (after) ### using OMP_NUM_THREADS=1 ### using numactl --physcpubind=0 --membind=0 max: size: torch.Size([64, 128, 1024]) 0.953 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96079 Approved by: https://github.com/jgong5, https://github.com/kit1980
Author
Committer
Parents
Loading