pytorch
834279f4 - _min_max_val.dim: CPU implementation (#42894)

Commit View On GitHub

Commit

4 years ago

_min_max_val.dim: CPU implementation (#42894) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42894 Continuing the min_max kernel implementation, this PR adds the CPU path when a dim is specified. Next PR will replicate for CUDA. Note: after a discussion with ngimel, we are taking the fast path of calculating the values only and not the indices, since that is what is needed for quantization, and calculating indices would require support for reductions on 4 outputs which is additional work. So, the API doesn't fully match `min.dim` and `max.dim`. Flexible on the name, let me know if something else is better. Test Plan: correctness: ``` python test/test_torch.py TestTorchDeviceTypeCPU.test_minmax_cpu_float32 ``` performance: seeing a 49% speedup on a min+max tensor with similar shapes to what we care about for quantization observers (bench: https://gist.github.com/vkuzo/b3f24d67060e916128a51777f9b89326). For other shapes (more dims, different dim sizes, etc), I've noticed a speedup as low as 20%, but we don't have a good use case to optimize that so perhaps we can save that for a future PR. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23086798 fbshipit-source-id: b24ce827d179191c30eccf31ab0b2b76139b0ad5

Author

vkuzo

Committer

facebook-github-bot

Parents

78994d16

pytorch 834279f4 - _min_max_val.dim: CPU implementation (#42894)

Commit

pytorch
834279f4 - _min_max_val.dim: CPU implementation (#42894)