pytorch
c585d354 - CUDACachingAllocator: Keep one event queue per stream (#71745)

Commit View On GitHub

Commit

2 years ago

CUDACachingAllocator: Keep one event queue per stream (#71745) Summary: Fixes https://github.com/pytorch/pytorch/issues/71616 This fixes the leaks in my test case. I have not tested it on our big models yet, but will report back if we can. This potentially impacts allocator performance in that it slightly increases the amount of CPU memory we allocate for data structures, and it means that `process_events` may look at a larger number of events in the case where there are multiple streams with long-running ops on them. However, I suspect that in general, either: - An application isn't using very many streams or very many long-running ops, in which case the performance is essentially the same - Or, they are, which is precisely the case where https://github.com/pytorch/pytorch/issues/71616 bites you, and so freeing memory faster is probably more valuable than the slight CPU overhead here. I'm not attached to this approach or any of its details, but figured it was worth throwing up for discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71745 Reviewed By: soulitzer Differential Revision: D33948288 Pulled By: ngimel fbshipit-source-id: 73e95f8a9bbe385a77de483d1c58b857b5d84e81 (cherry picked from commit d233719c072341607e6dab226b5cbfe8d316d91f)

References

#72894 - Merge pytorch master into lazy_tensor_staging

Author

nelhage

Committer

pytorchmergebot

Parents

d23231fd

pytorch c585d354 - CUDACachingAllocator: Keep one event queue per stream (#71745)

Commit

pytorch
c585d354 - CUDACachingAllocator: Keep one event queue per stream (#71745)