pytorch
73fadd52 - Use a single stream for cuda graph pool (#97419)

Commit

1 year ago

Use a single stream for cuda graph pool (#97419) Previously, we would use the same memory pool but not actually reuse the same memory. The peak memory showed good numbers, but real memory use was much higher because we had a bunch of unallocated segments that could not be reused. As stated in comments: NB: cuda caching allocator will remember the stream a segment is allocated to and only allocate that segment to the same stream. we need to use a single stream for all allocations to the memory pool, otherwise the allocations to separate streams will not be reused; separate recordings would have use the same memory pool, but not the same memory. Thanks to @zdevito for help debugging this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97419 Approved by: https://github.com/ngimel

Author

eellison

Committer

pytorchmergebot

Parents

b11ce4bb

pytorch 73fadd52 - Use a single stream for cuda graph pool (#97419)

pytorch
73fadd52 - Use a single stream for cuda graph pool (#97419)