pytorch
fe880469 - Use aten's GRAIN_SIZE for TH Tensor ops (#28770)

Commit View On GitHub

Commit

4 years ago

Use aten's GRAIN_SIZE for TH Tensor ops (#28770) Summary: Fixes https://github.com/pytorch/pytorch/issues/28198 in my tests on a 24 core AMD threadripper. Profiling the benchmark showed that most of the slowdown in https://github.com/pytorch/pytorch/issues/28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size. Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before. Here are the timing results I get: | Version | Full iteration time | `index_select` | `mm` | `addmm` | |:----------:|---------------:|-------------:|---------:|---------:| | master | 3505.85 ms/it | 184.302 ms | 9.520 ms | 8.494 ms | | no scaling | 3453.18 ms/it | 184.456 ms | 5.810 ms | 5.069 ms | | this PR | 3453.23 ms/it | 184.526 ms | 5.824 ms | 5.202 ms | Pull Request resolved: https://github.com/pytorch/pytorch/pull/28770 Differential Revision: D18202646 Pulled By: ezyang fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854

Author

peterbell10

Committer

facebook-github-bot

Parents

9630b78c

pytorch fe880469 - Use aten's GRAIN_SIZE for TH Tensor ops (#28770)

Commit

pytorch
fe880469 - Use aten's GRAIN_SIZE for TH Tensor ops (#28770)