DeepSpeed
4544b7d2 - Improve flops profiler functionality (#1065)

Commit
4 years ago
Improve flops profiler functionality (#1065) * use the original function's name as the key to old_functions dict * update profile output format * print at global rank 0 * add flops calculation in bwd pass using time from ds timers * improve aggregated profiling out to show all depth * print samples/second * update readme and examples * update docs * fix typo and reorder printing * fix format
Author
Parents
Loading