pytorch
edf8130e - [PyTorch] Add set_data_ptr_noswap & use where possible (#52244)

Commit
4 years ago
[PyTorch] Add set_data_ptr_noswap & use where possible (#52244) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52244 `StorageImpl::set_data_ptr` returns the old pointer and thus has to do extra work. Found because `std::swap<at::DataPtr>` was showing up in profiling, although at < 1%. ghstack-source-id: 121795131 Test Plan: Run AdIndexer benchmark under `perf stat`. Before: ``` 17,990.01 msec task-clock # 0.998 CPUs utilized ( +- 0.43% ) 6,550 context-switches # 0.364 K/sec ( +- 31.42% ) 3 cpu-migrations # 0.000 K/sec ( +- 7.14% ) 103,820 page-faults # 0.006 M/sec ( +- 2.47% ) 35,610,511,494 cycles # 1.979 GHz ( +- 0.40% ) (50.03%) 71,651,045,779 instructions # 2.01 insn per cycle ( +- 0.07% ) (50.02%) 11,679,947,910 branches # 649.246 M/sec ( +- 0.10% ) (50.03%) 69,088,927 branch-misses # 0.59% of all branches ( +- 0.24% ) (50.06% ``` After: ``` 17,896.20 msec task-clock # 0.999 CPUs utilized ( +- 0.24% ) 4,011 context-switches # 0.224 K/sec ( +- 27.77% ) 3 cpu-migrations # 0.000 K/sec 100,350 page-faults # 0.006 M/sec ( +- 1.58% ) 35,418,702,208 cycles # 1.979 GHz ( +- 0.23% ) (50.05%) 71,449,334,935 instructions # 2.02 insn per cycle ( +- 0.09% ) (50.03%) 11,652,819,899 branches # 651.134 M/sec ( +- 0.12% ) (50.04%) 69,744,411 branch-misses # 0.60% of all branches ( +- 0.53% ) (50.06%) ``` Cycles difference is within the noise, but it looks like we have an 0.28% instruction count win, which is outside the noise (and fits with intuition that this should be better). Reviewed By: hlu1 Differential Revision: D26437297 fbshipit-source-id: bf0fceccf6ad78f1497b03ccb4cdfd1a21c6846c
Author
Parents
Loading