pytorch
32461ed3 - Pool cudaEvents in CUDACachingAllocator (#78279)

Commit
3 years ago
Pool cudaEvents in CUDACachingAllocator (#78279) Summary: cudaEventCreate/Destroy can be expensive especially when the process is calling lots of other CUDA APIs. Pool the `cudaEvent_t` objects so that we create them once and reuse as much as possible. Test Plan: Unit tests to check the functionality. Manual performance testing shows that this diff is perf positive. | | create_event_internal (us) | free_event_internal/destructor (us) | insert_events (us) | process_events (us) | | baseline | 2.411 | 2.647 | 3.968 | 0.321 | | this diff | 0.115 | 0.147 | 2.846 | 0.262 | | speed up | 20.9x | 18.0x | 1.4x | 1.2x | Differential Revision: D35729059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78279 Approved by: https://github.com/jianyuh
Author
Committer
Parents
Loading