pytorch
ffdecc1a - [CUDA graphs] Allows DeviceCachingAllocator to capture cross-stream memory use (#55860)

Commit

3 years ago

[CUDA graphs] Allows DeviceCachingAllocator to capture cross-stream memory use (#55860) Summary: Safely deallocating and repurposing memory used across streams relies on recording end-of-life events in all an allocation's usage streams beyond its original allocation stream. The events are later queried to see if all GPU work in those extra streams that could have used the allocation is done (from the CPU's perspective) before repurposing the allocation for use in its original stream. The trouble is, calling EventQuery on an ordinary event recorded in a capturing stream is illegal. Calling EventQuery while capture is underway is also illegal. So when we call `tensor.record_stream` (or `c10::cuda::cudaCachingAllocator::recordStream`) on any tensor that's used or deleted in or around a capture, we often end up with a confusing error thrown from the cudaEventQuery in DeviceCachingAllocator::process_events(). This PR enables hopefully-safe deletion of tensors used across streams in or around capture with a conservative but simple approach: don't record or process end of life events for such tensors until the allocator's sure no captures are underway. You could whiteboard cases where this causes cross-stream-used allocations to be unavailable for reuse longer than absolutely necessary, but cross-stream-used allocations are uncommon, so for practical purposes this approach's impact on the memory footprint of captured sequences should be small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55860 Reviewed By: ejguan Differential Revision: D27822557 Pulled By: ezyang fbshipit-source-id: b2e18a19d83ed05bad67a8157a14a606ed14d04e

Author

mcarilli

Committer

facebook-github-bot

Parents

3e42da09

pytorch ffdecc1a - [CUDA graphs] Allows DeviceCachingAllocator to capture cross-stream memory use (#55860)

pytorch
ffdecc1a - [CUDA graphs] Allows DeviceCachingAllocator to capture cross-stream memory use (#55860)