pytorch
74993dcf - Add repeats to Timer.collect_callgrind(...) (#53295)

Commit
3 years ago
Add repeats to Timer.collect_callgrind(...) (#53295) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53295 A lot of the time spent in `collect_callgrind` is spinning up Valgrind and executing the initial `import torch`. In most cases the actual run loop is a much smaller fraction. As a result, we can reuse the same process to do multiple replicates and do a much better job amortizing that startup cost. This also tends to result in more stable measurements: the kth run is more repeatable than the first because everything has been given a chance to settle into a steady state. The instruction microbenchmarks lean heavily on this behavior. I found that in practice doing several `n=100` replicates to be more reliable than one monolithic 10,000+ iteration run. (Since rare cases like memory consolidation will just contaminate that one replicate, as opposed to getting mixed into the entire long run.) Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26907093 Pulled By: robieta fbshipit-source-id: 72e5b48896911f5dbde96c8387845d7f9882fdb2
Author
Taylor Robie
Parents
Loading