pytorch
505e0fdf - Add lazy_bench.py to measure trace overhead and compute efficiency (#68563)

Commit
3 years ago
Add lazy_bench.py to measure trace overhead and compute efficiency (#68563) This tool has 2 new 'experiments' and builds on infra from @jansel's torchdynamo bench script torchdynamo/torchbench.py. The infra components take care of iterating over torchbench models in a more convenient way, handling filtering, errors correctness checks mixing in non-torchbenchmark benchmarks interleaving measurements of the control/experiment and computing statistical significance hooks for synchronization that can be specialized for cuda or lazytensor custom sync modes to allow syncing after every step or running many async steps before syncing The overhead experiment compares the lazy trace overhead with the full cuda execution time. This may sound like a strange choice, but the point is to provide a reference point of how much time lazy tracing takes with reference to the time eager normally spends in execution. The expectation is a small fraction. an alternative approach is to measure lazy tracing against the portion of eager that launches cuda kernels but never sync. If we could guarantee that some of the time, the cuda driver didn't force syncs, this would be fair. It wouldn't work for CPU. At least the full-sync way is consistent another alternative is to compare lazy trace overhead to execution with meta-tensors. We don't expect meta-tensors to work for this case, since we know that many of the ops we lazy-trace are not yet structured kernels.
Author
Parents
Loading